 |
» |
|
|
 |
The Loop Report lists the optimizations that are performed
on loops and calls. If appropriate, the report gives reasons why
a possible optimization was not performed.
Loop nests are reported in the order in which they are encountered
and separated by a blank line. Below is a sample optimization report.  |
Optimization ReportLine Id Var Reordering New Optimizing / Special Num. Num. Name Transformation Id Nums Transformation ----------------------------------------------------------------------------- 3 1 sub1 *Inlined call (2-4) 8 2 iloopi:1 Serial Fused 11 3 jloopi:2 Serial Fused 14 4 kloopi:3 Serial Fused *Fused (5) (2 3 4) -> (5) 8 5 iloopi:1 PARALLEL Footnoted User Var Name Var Name ----------------------------------------------------------------------------- iloopi:1 iloopindex jloopi:2 jloopindex kloopi:3 kloopindex Optimization for sub1Line Id Var Reordering New Optimizing / Special Num. Num. Name Transformation Id Nums Transformation ----------------------------------------------------------------------------- 8 1 iloopi:1 Serial Fused 11 2 jloopi:2 Serial Fused 14 3 kloopi:3 Serial Fused *Fused (4) (1 2 3) -> (4) 8 4 iloopi:1 PARALLEL Footnoted User Var Name Var Name ----------------------------------------------------------------------------- iloopi:1 iloopindex jloopi:2 jloopindex kloopi:3 kloopindex |
 |
A description of each column of the Loop Report is shown in
Table 8-2 “Loop
Report column definitions”. Table 8-2 Loop
Report column definitions | Column | Description |
|---|
| Line Num. | Specifies the source line of the beginning
of the loop or of the loop from which it was derived. For cloned
calls and inlined calls, the Line Num.
column specifies the source line at which the call statement appears. | | Id Num. | Specifies a unique ID number for every
optimized loop and for every optimized call. This ID number can
then be referenced by other parts of the report. Both loops appearing
in the original program source and loops created by the compiler
are given loop ID numbers. Loops created by the compiler are also
shown in the New Id Nums column
as described later. No distinction between compiler-generated loops
and loops that existed in the original source is made in the Id Num.
column. Loops are assigned unique, sequential numbers as they are
encountered. | | Var Name | Specifies the name of the iteration variable
controlling the loop or the called procedure if the line represents
a call. If the variable is compiler-generated, its name is listed
as *VAR*. If it consists of a truncated
variable name followed by a colon and a number, the number is a
reference to the variable name footnote table, which appears after
the Loop Report and Analysis Table in the Optimization Report. | | Reordering Transformation | Indicates which reordering transformations
were performed. Reordering transformations are performed on loops,
calls, and loop nests, and typically involve reordering and/or
duplicating sections of code to facilitate more efficient execution.
This column has one of the values shown in Table 8-3 “Reordering
transformation values in the Loop Report”
. | | New Id Nums | Specifies the ID number for loops or
calls created by the compiler. These ID numbers are listed in the
Id Num. column and is referenced
in other parts of the report. However, the loops and calls they
represent were not present in the original source code. In the case
of loop fusion, the number in this column indicates the new loop
created by merging all the fused loops. New ID numbers are also
created for cloned calls, inlined calls, loop blocking, loop distribution,
loop interchange, loop unroll and jam, dynamic selection, and test
promotion. | | Optimizing / Special Transformation | Indicates which, if any, optimizing transformations
were performed. An optimizing transformation reduces the number
of operations executed, or replaces operations with simpler operations.
A special transformation allows the compiler to optimize code under
special circumstances. When appropriate, this column has one of
the values shown in Table 8-4 “Optimizing/special
transformations values in the Loop Report”
. |
The following values apply to the Reordering Transformation
column described in Table 8-2 “Loop
Report column definitions”. Table 8-3 Reordering
transformation values in the Loop Report | Value | Description |
|---|
| Block | Loop blocking was performed. The
new loop order is indicated under the Optimizing/Special Transformation
column, as shown in Table 8-4 “Optimizing/special
transformations values in the Loop Report”. | | Cloned call | A call to a subroutine was cloned. | | Dist | Loop distribution was performed. | | DynSel | Dynamic selection was performed.
The numbers in the New Id Nums
column correspond to the loops created. For parallel loops, these
generally include a PARALLEL and
a Serial version. | | Fused | The loops were fused into another
loop and no longer exist. The original loops and the new loop is
indicated under the Optimizing/Special Transformation
column, as shown in Table 8-4 “Optimizing/special
transformations values in the Loop Report”. | | Inlined call | A call to a subroutine was inlined. | | Interchange | Loop interchange was performed.
The new loop order is indicated under the Optimizing/Special Transformation
column, as shown in Table 8-4 “Optimizing/special
transformations values in the Loop Report”. | | None | No reordering transformation was
performed on the call. | | PARALLEL | The loop runs in thread-parallel
mode. | | Peel | The first or last iteration of
the loop was peeled in order to fuse the loop with an adjacent loop. | | Promote | Test promotion was performed. | | Serial | No reordering transformation was
performed on the loop. | | Unroll and Jam | The loop was unrolled and the
nested loops were jammed (fused). | | VECTOR | The loop was fully or partially
replaced with more efficient calls to one or more vector routines. | | * | Appears at left of loop-producing transformation
optimizations (distribution, dynamic selection, blocking, fusion,
interchange, call cloning, call inlining, peeling, promotion, unroll
and jam). |
The following values apply to the Optimizing/special transformations
column described in Table 8-2 “Loop
Report column definitions”. Table 8-4 Optimizing/special
transformations values in the Loop Report | Value | Explanation |
|---|
| Fused | The loop was fused into another
loop and no longer exists. | | Reduction | The compiler recognized a reduction
in the loop. | | Removed | The compiler removed the loop. | | Unrolled | The loop was completely unrolled. | | (OrigOrder)
-> (InterchangedOrder) | This information appears when
Interchange is reported under Reordering Transformation.
OrigOrder indicates the order of loops
in the original nest. InterchangedOrder
indicates the new order that occurs due to interchange. OrigOrder
and InterchangedOrder consist of user
iteration variables presented in outermost to innermost order. | | (OrigLoops)->(NewLoop) | This information appears when
Fused is reported under Reordering Transformation.
OrigLoops indicates the original loops that
were fused by the compiler to form the loop indicated by NewLoop.
OrigLoops and NewLoop
refer to loops based on the values from the Id Num.
and New Id Nums columns in the
Loop Report. | | (OrigLoopNest)->(BlockedLoopNest) | This information appears when Block
is reported under Reordering Transformation.
OrigLoopNest indicates the order of the original
loop nest containing a loop that was blocked. BlockedLoopNest
indicates the order of loops after blocking. OrigLoopNest
and BlockedLoopNest refer to user iteration
variables presented in outermost to innermost order. |
Supplemental tables |  |
The tables described in this section may be included in the
Optimization Report to provide information supplemental
to the Loop Report. If necessary, an Analysis Table is included in the
Optimization Report to further elaborate on optimizations reported
in the Loop Report. A description of each column in the Analysis Table is shown
in Table 8-5 “Analysis
Table column definitions”. Table 8-5 Analysis
Table column definitions | Column | Description |
|---|
| Line Num. | Specifies the source line of the beginning
of the loop or call. | | Id Num. | References the ID number assigned to
the loop or call in the Loop Report. | | Var Name | Specifies the name of the iteration variable
controlling the loop, *VAR* (as
discussed in the Var Name description
in the section “Loop Report”). | | Analysis | Indicates why a transformation or optimization
was not performed, or additional information on what was done. |
This table reports any user variables contained in a parallelized
loop that are privatized by the compiler. Because the Privatization
Table refers to loops, the
Loop Report is automatically provided with it. A description of each column in the Privatization Table is
shown in Table 8-6 “Privatization
Table column definitions”. Table 8-6 Privatization
Table column definitions | Column | Definitions |
|---|
| Line Num. | Specifies the source line of the beginning
of the loop. | | Id Num. | References the ID number assigned to
the loop in the loop table. | | Var Name | Specifies the name of the iteration variable
controlling the loop. *VAR* may
also appear in this column, as discussed in the Var Name
description in the section “Loop Report”. | | Priv Var | Specifies the name of the privatized
user variable. Compiler-generated variables that are privatized
are not reported here. | | Privatization Information for Parallel Loops | Provides more detail on the variable
privatizations performed. |
Variable Name Footnote TableVariable names that are too long to fit in the Var Name
columns of the other tables are truncated and followed by a colon
and a footnote number. These footnotes are explained in the Variable
Name Footnote Table. A description of each column in the Variable Name Footnote
Table is shown in Table 8-7 “Variable
Name Footnote Table column definitions”. Table 8-7 Variable
Name Footnote Table column definitions | Column | Definition |
|---|
| Footnoted Var Name | Specifies the truncated variable name
and its footnote number. | | User Var Name | Specifies the full name of the variable as
identified in the source code. |
Optimization Report The following Fortran program is the basis for the Optimization
Report shown in this example. Line numbers are provided for ease
of reference. 1 PROGRAM EXAMPLE99 2 REAL A(100), B(100), C(100) 3 CALL SUB1(A,B,C) 4 END 5 6 SUBROUTINE SUB1(A,B,C) 7 REAL A(100), B(100), C(100) 8 DO ILOOPINDEX=1,100 9 A(ILOOPINDEX) = ILOOPINDEX 10 ENDDO 11 DO JLOOPINDEX=1,100 12 B(JLOOPINDEX) = A(JLOOPINDEX)**2 13 ENDDO 14 DO KLOOPINDEX=1, 100 15 C(KLOOPINDEX) = A(KLOOPINDEX) + B(KLOOPINDEX) 16 ENDDO 17 PRINT *, A(1), B(50), C(100) 18 END |
The following Optimization Report is generated by compiling
the program EXAMPLE99 with the
command-line options +O3 +Oparallel +Oreport=all +Oinline=sub1: % f90 +O3 +Oparallel
+Oreport=all +Oinline=sub1 EXAMPLE99.f  |
Optimization for EXAMPLE99Line Id Var Reordering New Optimizing / Special Num. Num. Name Transformation Id Nums Transformation ----------------------------------------------------------------------------- 3 1 sub1 *Inlined call (2-4) 8 2 iloopi:1 Serial Fused 11 3 jloopi:2 Serial Fused 14 4 kloopi:3 Serial Fused *Fused (5) (2 3 4) -> (5) 8 5 iloopi:1 PARALLEL Footnoted User Var Name Var Name ----------------------------------------------------------------------------- iloopi:1 iloopindex jloopi:2 jloopindex kloopi:3 kloopindex Optimization for sub1Line Id Var Reordering New Optimizing / Special Num. Num. Name Transformation Id Nums Transformation ----------------------------------------------------------------------------- 8 1 iloopi:1 Serial Fused 11 2 jloopi:2 Serial Fused 14 3 kloopi:3 Serial Fused *Fused (4) (1 2 3) -> (4) 8 4 iloopi:1 PARALLEL Footnoted User Var Name Var Name ----------------------------------------------------------------------------- iloopi:1 iloopindex jloopi:2 jloopindex kloopi:3 kloopindex |
 |
The Optimization Report for
EXAMPLE99 provides the following
information: Call to sub1 is inlined The
first line of the Loop Report shows that the call to sub1
was inlined, as shown below:
3 1 sub1 *Inlined call (2-4) Three new loops produced The
inlining produced three new loops in EXAMPLE99:
Loop #2, Loop #3,
and Loop #4. Internally, the EXAMPLE99
module that originally looked like:
1 PROGRAM EXAMPLE99 2 REAL A(100), B(100), C(100) 3 CALL SUB1(A,B,C) 4 END |
now looks like this: PROGRAM EXAMPLE99 REAL A(100), B(100), C(100) DO ILOOPINDEX=1,100 !Loop #2 A(ILOOPINDEX) = ILOOPINDEX ENDDO DO JLOOPINDEX=1,100 !Loop #3 B(JLOOPINDEX) = A(JLOOPINDEX)**2 ENDDO DO KLOOPINDEX=1, 100 !Loop #4 C(KLOOPINDEX) = A(KLOOPINDEX) + B(KLOOPINDEX) ENDDO PRINT *, A(1), B(50), C(100) END |
New loops are fused These
lines indicate that the new
loops have been fused. The following line indicates that the three
loops were fused into one new loop, Loop #5. 8 2 iloopi:1 Serial Fused 11 3 jloopi:2 Serial Fused 14 4 kloopi:3 Serial Fused *Fused (5) (2 3 4) (5) |
After fusing, the code internally appears as the following: PROGRAM EXAMPLE99 REAL A(100), B(100), C(100) DO ILOOPINDEX=1,100 !Loop #5 A(ILOOPINDEX) = ILOOPINDEX B(ILOOPINDEX) = A(ILOOPINDEX)**2 C(ILOOPINDEX) = A(ILOOPINDEX) + B(ILOOPINDEX) ENDDO PRINT *, A(1), B(50), C(100) END |
New loop is parallelized In
the following Loop Report line:
8 5 iloopi:1 PARALLEL
Loop #5 uses iloopi:1
as the iteration variable, referencing the Variable Name Footnote Table;
iloopi:1 corresponds to iloopindex.
The same line in the report also indicates that the newly-created
Loop #5 was parallelized. Variable Name Footnote Table lists
iteration variables According to the Variable Name Footnote
Table (duplicated below), the original variable iloopindex
is abbreviated by the compiler as iloopi:1
so that it fits into the Var Name
columns of other reports. jloopindex and kloopindex
are abbreviated as jloopi:2 and
kloopi:3, respectively. These names
are used throughout the report to refer to these iteration variables. Footnoted User Var Name Var Name ----------------------- iloopi:1 iloopindex jloopi:2 jloopindex kloopi:3 kloopindex |
Optimization Report The following Fortran code provides an example of other transformations
the compiler performs. Line numbers are provided for ease of reference. 1 PROGRAM EXAMPLE100 2 3 INTEGER IA1(100), IA2(100), IA3(100) 4 INTEGER I1, I2 5 6 DO I = 1, 100 7 IA1(I) = I 8 IA2(I) = I * 2 9 IA3(I) = I * 3 10 ENDDO 11 12 I1 = 0 13 I2 = 100 14 CALL SUB1 (IA1, IA2, IA3, I1, I2) 15 END 16 17 SUBROUTINE SUB1(A, B, C, S, N) 18 INTEGER A(N), B(N), C(N), S, I, J 19 DO J = 1, N 20 DO I = 1, N 21 IF (I .EQ. 1) THEN 22 S = S + A(I) 23 ELSE IF (I .EQ. N) THEN 24 S = S + B(I) 25 ELSE 26 S = S + C(I) 27 ENDIF 28 ENDDO 29 ENDDO 30 END |
The following Optimization Report is generated by compiling
the program EXAMPLE100 for parallelization: % f90 +O3 +Oparallel
+Oreport=all example100.f Optimization for SUB1 Line Id Var Reordering New Optimizing / Special Num. Num. Name Transformation Id Nums Transformation ----------------------------------------------------------------------------- 19 1 j *Interchange (2) (j i) -> (i j) 20 2 i *DynSel (3-4) 20 3 i PARALLEL Reduction 19 5 j *Promote (6-7) 19 6 j Serial 19 7 j Serial 20 4 i Serial 19 8 j *Promote (9-10) 19 9 j Serial 19 10 j *Promote (11-12) 19 11 j Serial 19 12 j Serial Line Id Var Analysis Num. Num. Name ----------------------------------------------------------------------------- 19 5 j Test on line 21 promoted out of loop 19 8 j Test on line 21 promoted out of loop 19 10 j Test on line 23 promoted out of loop |
The report is continued on the next page.  |
Optimization for clone 1 of SUB1 (6_e70_cl_sub1) Line Id Var Reordering New Optimizing / Special Num. Num. Name Transformation Id Nums Transformation ----------------------------------------------------------------------------- 19 1 j *Interchange (2) (j i) -> (i j) 20 2 i PARALLEL Reduction 19 3 j *Promote (4-5) 19 4 j Serial 19 5 j *Promote (6-7) 19 6 j Serial 19 7 j Serial Line Id Var Analysis Num. Num. Name ----------------------------------------------------------------------------- 19 3 j Test on line 21 promoted out of loop 19 5 j Test on line 23 promoted out of loop Optimization for example100 Line Id Var Reordering New Optimizing / Special Num. Num. Name Transformation Id Nums Transformation ----------------------------------------------------------------------------- 6 1 i Serial 14 2 sub1 *Cloned call (3) 14 3 sub1 None Line Id Var Analysis Num. Num. Name ----------------------------------------------------------------------------- 14 2 sub1 Call target changed to clone 1 of SUB1 (6_e70_cl_sub1) |
 |
The Optimization Report for
EXAMPLE100 shows Optimization Reports for the subroutine and its
clone, followed by the optimizations to the subroutine. It includes
the following information: Original subroutine contents Originally,
the subroutine appeared as shown below:
17 SUBROUTINE SUB1(A, B, C, S, N) 18 INTEGER A(N), B(N), C(N), S, I, J 19 DO J = 1, N 20 DO I = 1, N 21 IF (I .EQ. 1) THEN 22 S = S + A(I) 23 ELSE IF (I .EQ. N) THEN 24 S = S + B(I) 25 ELSE 26 S = S + C(I) 27 ENDIF 28 ENDDO 29 ENDDO 30 END |
Loop interchange performed first The
compiler first performs loop interchange (listed as Interchange
in the report) to maximize cache performance:
19 1 j *Interchange (2) (j i) -> (i j) |
The subroutine then becomes the following
17 SUBROUTINE SUB1(A, B, C, S, N) 18 INTEGER A(N), B(N), C(N), S, I, J 19 DO I = 1, N ! Loop #2 20 DO J = 1, N ! Loop #1 21 IF (I .EQ. 1) THEN 22 S = S + A(I) 23 ELSE IF (I .EQ. N) THEN 24 S = S + B(I) 25 ELSE 26 S = S + C(I) 27 ENDIF 28 ENDDO 29 ENDDO 30 END |
The program is optimized for parallelization
The compiler would like to parallelize the outermost loop
in the nest, which is now the I loop.
However because the value of N
is not known, the compiler does not know how many times the I loop
needs to be executed. To ensure that the loop is executed as efficiently
as possible at runtime, the compiler replaces the I loop
nest with two new copies of the I loop
nest, one to be run in parallel, the other to be run serially.
Dynamic selection is executed An
IF is then inserted to select the
more efficient version of the loop to execute at runtime. This method
of making one copy for parallel execution and one copy for serial
execution is known as dynamic selection, which is enabled
by default when +O3 +Oparallel
is specified (see “Dynamic selection” for
more information). This optimization is reported in the Loop Report
in the line:
Loop#2
creates two loops According to the report, Loop #2
was used to create the new loops, Loop #3
and Loop #4. Internally, the code
now is represented as follows:
SUBROUTINE SUB1(A, B, C, S, N) INTEGER A(N), B(N), C(N), S, I, J |
IF (N .GT. some_threshold) THEN DO (parallel) I = 1, N ! Loop #3 DO J = 1, N ! Loop #5 IF (I .EQ. 1) THEN S = S + A(I) ELSE IF (I .EQ. N) THEN S = S + B(I) ELSE S = S + C(I) ENDIF ENDDO ENDDO ELSE DO I = 1, N ! Loop #4 DO J = 1, N ! Loop #8 IF (I .EQ. 1) THEN S = S + A(I) ELSE IF (I .EQ. N) THEN S = S + B(I) ELSE S = S + C(I) ENDIF ENDDO ENDDO ENDIF END |
Loop#3
contains reductions Loop #3
(which was parallelized) also contained one or more reductions.
The Reordering Transformation column indicates
that the IF statements were promoted
out of Loop #5, Loop #8,
and Loop #10.
Analysis Table lists new loops The
line numbers of the promoted IF
statements are listed. The first test in Loop #5
was promoted, creating two new loops, Loop #6
and Loop #7. Similarly, Loop #8
has a test promoted, creating Loop #9
and Loop #10. The test remaining
in Loop #10 is then promoted, thereby
creating two additional loops. A promoted test is an IF
statement that is hoisted out of a loop. See the section “Test promotion” for more information.
The Analysis Table contents are shown below:
19 5 j Test on line 21 promoted out of loop 19 8 j Test on line 21 promoted out of loop 19 10 j Test on line 23 promoted out of loop |
DO loop is not reordered The
following DO loop does not undergo
any reordering transformation:
6 DO I = 1, 100 7 IA1(I) = I 8 IA2(I) = I * 2 9 IA3(I) = I * 3 10 ENDDO |
This fact is reported by the line sub1
is cloned The call to the subroutine sub1
is cloned. As indicated by the asterisk (*),
the compiler produced a new call. The new call is given the ID (3)
listed in the New Id Nums column.
The new call is then listed, with None
indicating that no reordering transformation was performed on the
call to the new subroutine.
14 2 sub1 *Cloned call (3) 14 3 sub1 None |
Cloned call
is transformed The call to the subroutine is then appended
to the Loop Report to elaborate on the Cloned call
transformation. This line shows that the clone was called in place
of the original subroutine.
14 2 sub1 Call target changed to clone 1 of SUB1 (6_e70_cl_sub1) |
Optimization Report The following Fortran code shows loop blocking, loop peeling,
loop distribution, and loop unroll and jam. Line numbers are listed
for ease of reference. 1 PROGRAM EXAMPLE200 2 3 REAL*8 A(1000,1000), B(1000,1000), C(1000) 4 REAL*8 D(1000), E(1000) 5 INTEGER M, N 6 7 N = 1000 8 M = 1000 9 10 DO I = 1, N 11 C(I) = 0 12 DO J = 1, M 13 A(I,J) = A(I,J) + B(I,J) * C(I) 14 ENDDO 15 ENDDO 16 17 DO I = 1, N-1 18 D(I) = I 19 ENDDO 20 21 DO J = 1, N 22 E(J) = D(J) + 1 23 ENDDO 24 25 PRINT *, A(103,103), B(517, 517), D(11), E(29) 26 27 END |
The following Optimization Report is generated by compiling
program EXAMPLE200 as follows: % f90 +O3 +Oreport
+Oloop_block example200.f  |
Optimization for example3 Line Id Var Reordering New Optimizing / Special Num. Num. Name Transformation Id Nums Transformation ----------------------------------------------------------------------------- 10 1 i:1 *Dist (2-3) 10 2 i:1 Serial 10 3 i:1 *Interchange (4) (i:1 j:1) -> (j:1 i:1) 12 4 j:1 *Block (5) (j:1 i:1) -> (i:1 j:1 i:1) 10 5 i:1 *Promote (6-7) 10 6 i:1 Serial Removed 10 7 i:1 Serial 12 8 j:1 *Unroll And Jam (9) 12 9 j:1 *Promote (10-11) 12 10 j:1 Serial Removed 12 11 j:1 Serial 10 12 i:1 Serial 17 13 i:2 Serial Fused 21 14 j:2 *Peel (15) 21 15 j:2 Serial Fused *Fused (16) (13 15) -> (16) 17 16 i:2 Serial Line Id Var Analysis Num. Num. Name ----------------------------------------------------------------------------- 10 5 i:1 Loop blocked by 56 iterations 10 5 i:1 Test on line 12 promoted out of loop 10 6 i:1 Loop blocked by 56 iterations 10 7 i:1 Loop blocked by 56 iterations 12 8 j:1 Loop unrolled by 8 iterations and jammed into the innermost loop 12 9 j:1 Test on line 10 promoted out of loop 21 14 j:2 Peeled last iteration of loop
|
 |
The Optimization Report for EXAMPLE200 provides
the following results: Several occurrences of variables noted In
this report, the Var Name column
has entries such as i:1, j:1,
i:2, and j:2.
This type of entry appears when a variable is used more than once.
In EXAMPLE200, I
is used as an iteration variable twice. Consequently, i:1
refers to the first occurrence, and i:2
refers to the second occurrence.
Loop #1
creates new loops The first line of the report shows that
Loop #1, shown on line 10, is
distributed to create Loop #2 and
Loop #3:
Initially, Loop #1
appears as shown. DO I = 1, N ! Loop #1 C(I) = 0 DO J = 1, M A(I,J) = A(I,J) + B(I,J) * C(I) ENDDO ENDDO |
It is then distributed as follows: DO I = 1, N ! Loop #2 C(I) = 0 ENDDO DO I = 1, N ! Loop #3 DO J = 1, M A(I,J) = A(I,J) + B(I,J) * C(I) ENDDO ENDDO |
Loop #3
is interchanged to create Loop#4 The
third line indicates this:
10 3 i:1 *Interchange (4) (i:1 j:1) -> (j:1 i:1) |
Now, the loop looks like the following code: DO J = 1, M ! Loop #4 DO I = 1, N A(I,J) = A(I,J) + B(I,J) * C(I) ENDDO ENDDO |
Nested loop is blocked The
next line of the Optimization Report indicates that the nest rooted
at Loop #4 is blocked:
12 4 j:1 *Block (5) (j:1 i:1) -> (i:1 j:1 i:1) |
The blocked nest internally appears
as follows:
DO IOUT = 1, N, 56 ! Loop #5 DO J = 1, M DO I = IOUT, IOUT + 55 A(I,J) = A(I,J) + B(I,J) * C(I) ENDDO ENDDO ENDDO |
Loop #5
noted as blocked The loop with iteration variable i:1
is the loop that was actually blocked. The report shows *Block
on Loop #4 (the j:1
loop) because the entire nest rooted at Loop #4
is replaced by the blocked nest.
IOUT
variable facilitates loop blocking The IOUT
variable is introduced to facilitate the loop blocking. The compiler
uses a step value of 56 for the IOUT
loop as reported in the Analysis Table:
10 5 i:1 Loop blocked by 56 iterations |
Test promotion creates new loops The
next three lines of the report show that a test was promoted out
of Loop #5, creating Loop #6
(which is removed) and Loop #7
(which is run serially). This test—which does not appear
in the source code—is an implicit test that the compiler
inserts in the code to ensure that the loop iterates at least once.
10 5 i:1 *Promote (6-7) 10 6 i:1 Serial Removed 10 7 i:1 Serial |
This test is referenced again in the
following line from the Analysis Table:
10 5 i:1 Test on line 12 promoted out of loop |
Unroll and jam creates new loop The
report indicates that the J is
unrolled and jammed, creating Loop #9:
12 8 j:1 *Unroll And Jam (9) |
J
loop unrolled by 8 iterations This line also indicates
that the J loop is unrolled by
8 iterations and fused:
12 8 j:1 Loop unrolled by 8 iterations and jammed into the innermost loop |
The unrolled and jammed loop results
in the following code:
DO IOUT = 1, N, 56 ! Loop #5 DO J = 1, M, 8 ! Loop #8 DO I = IOUT, IOUT + 55 ! Loop #9 A(I,J) = A(I,J) + B(I,J) * C(I) A(I,J+1) = A(I,J+1) + B(I,J+1) * C(I) A(I,J+2) = A(I,J+2) + B(I,J+2) * C(I) A(I,J+3) = A(I,J+3) + B(I,J+3) * C(I) A(I,J+4) = A(I,J+4) + B(I,J+4) * C(I) A(I,J+5) = A(I,J+5) + B(I,J+5) * C(I) A(I,J+6) = A(I,J+6) + B(I,J+6) * C(I) A(I,J+7) = A(I,J+7) + B(I,J+7) * C(I) ENDDO ENDDO ENDDO |
Test promotion in Loop #9
creates new loops The Optimization Report indicates that
the compiler-inserted test in Loop #9
is promoted out the loop, creating Loop #10
and Loop #11.
12 9 j:1 *Promote (10-11) 12 10 j:1 Serial Removed 12 11 j:1 Serial |
Loops are fused According
to the report, the last two loops in the program are fused (once
an iteration is peeled off the second loop), then the new loop is
run serially.
17 13 i:2 Serial Fused 21 14 j:2 *Peel (15) 21 15 j:2 Serial Fused *Fused (16) (13 15) -> (16) 17 16 i:2 Serial |
That information is combined with the following line from
the Analysis Table: 21 14 j:2 Peeled last iteration of loop |
Loop peeling creates loop, enables
fusion Initially, Loop #14
has an iteration peeled to create Loop #15,
as shown below. The loop peeling is performed to enable loop fusion.
DO I = 1, N-1 ! Loop #13 D(I) = I ENDDO DO J = 1, N-1 ! Loop #15 E(J) = D(J) + 1 ENDDO |
Loops are fused to create new loop Loop #13
and Loop #15 are then fused to
produce Loop #16:
DO I = 1, N-1 ! Loop #16 D(I) = I E(I) = D(I) + 1 ENDDO |
|