Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Parallel Programming Guide for HP-UX Systems: K-Class and V-Class Servers > Chapter 5 Loop and cross-module optimization features

Loop fusion

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

Loop fusion involves creating one loop out of two or more neighboring loops that have identical loop bounds and trip counts. This reduces loop overhead, memory accesses, and increases register usage. It can also lead to other optimizations. By potentially reducing the number of parallelizable loops in a program and increasing the amount of work in each of those loops, loop fusion can greatly reduce parallelization overhead. Because fewer spawns and joins are necessary.

Loop fusion takes place at +O3 and above and is enabled by default. Specifying +Onoloop_transform disables loop fusion, as well as loop distribution, loop interchange, loop blocking, loop unroll, and loop unroll and jam.

Occasionally, loops that do not appear to be fusible become fusible as a result of compiler transformations that precede fusion. For instance, interchanging a loop may make it suitable for fusing with another loop.

Loop fusion is especially beneficial when applied to Fortran 90 array assignments. The compiler translates these statements into loops; when such loops do not contain code that inhibit fusion, they are fused.

Loop fusion

This example begins with the following Fortran code:

DO I = 1, N
A(I) = B(I) + C(I)
ENDDO
DO J = 1, N
IF(A(J) .LT. 0) A(J) = B(J)*B(J)
ENDDO

The two loops shown above are fused into the following loop using loop fusion:

DO I = 1, N
A(I) = B(I) + C(I)
IF(A(I) .LT. 0) A(I) = B(I)*B(I)
ENDDO

Loop fusion

This example begins with the following Fortran code:

REAL A(100,100), B(100,100), C(100,100)
.
.
.
C = 2.0 * B
A = A + B

The compiler first transforms these Fortran array assignments into loops, generating code similar to that shown below.

DO TEMP1 = 1, 100
DO TEMP2 = 1, 100
C(TEMP2, TEMP1) = 2.0 * B(TEMP2, TEMP1)
ENDDO
ENDDO
DO TEMP3 = 1, 100
DO TEMP4 = 1, 100
A(TEMP4,TEMP3)=A(TEMP4,TEMP3)+B(TEMP4,TEMP3)
ENDDO
ENDDO

These two loops would then be fused as shown in the following loop nest:

DO TEMP1 = 1, 100
DO TEMP2 = 1, 100
C(TEMP2,TEMP1) = 2.0 * B(TEMP2, TEMP1)
A(TEMP2,TEMP1)=A(TEMP2,TEMP1)+B(TEMP2,TEMP1)
ENDDO
ENDDO

Further optimizations could be applied to this new nest as appropriate.

Loop peeling

When trip counts of adjacent loops differ by only a single iteration (+1 or -1), the compiler may peel an iteration from one of the two loops so that the loops may then be fused. The peeled iteration is performed separately from the original loop.

The following Fortran example shows how this is implemented:

DO I = 1, N-1
A(I) = I
ENDDO

DO J = 1, N
A(J) = A(J) + 1
ENDDO

As you can see, the Nth iteration of the J loop is peeled, resulting in a trip count of N - 1. The Nth iteration is performed outside the J loop. As a result, the code is changed to the following:

DO I = 1, N-1
A(I) = I
ENDDO

DO J = 1, N-1
A(J) = A(J) + 1
ENDDO

A(N) = A(N) + 1

The I and J loops now have the same trip count and are fused, as shown below:

DO I = 1, N-1
A(I) = I
A(I) = A(I) + 1
ENDDO

A(N) = A(N) + 1
Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.