| United States-English |
|
|
|
![]() |
Fortran 90, Fortran 77, C, aC++: Exemplar Programming Guide > Chapter 4 Basic shared-memory programmingLoop-specific, task-specific, andregion-specific data privatization |
|
Once assigned, the memory classes discussed in detail in Chapter 5, "Chapter 5 “Memory classes”," are in effect throughout your entire program. Any loops that manipulate variables that have been explicitly assigned a memory class must be manually parallelized, and once a variable is assigned a class, its class cannot change. While very efficient programs can be written using these memory classes, they also require a great deal of manual intervention. To get around these problems, the Exemplar Fortran 90, Exemplar Fortran 77, and Exemplar C compilers support the loop_private, task_private, and parallel_private directives and pragmas. The save_last directive and pragma is provided to save the value of loop_private data objects assigned in the last iteration of the loop. These directives and pragmas allow you to easily privatize parallel loop, task, or region data temporarily; when used with prefer_parallel, they do so without inhibiting any automatic compiler optimizations. They can help you further increase the performance of your shared-memory program with less extra work than is required when using the standard memory classes accompanying manual parallelization and synchronization. You can use the above directives on local variables and arrays of any type, but they should not be used on data assigned one of the static or dynamic memory classes (thread_private, node_private, near_shared, far_shared or block_shared). In some cases, data declared loop_private, task_private, or parallel_private is stored on the stacks of the spawned threads. Spawned thread stacks default to 80 Mbytes in size; if the amount of loop_private, task_private or parallel_private data declared exceeds this, you can use the CPS_STACK_SIZE environment variable to increase the default. Refer to the section “Default stack size” for more information. The loop_private directive and pragma declares a list of variables and/or arrays private to the immediately following Fortran DO or C for loop. The compiler assumes that data objects declared to be loop_private have no loop-carried dependences with respect to the parallel loops in which they are used. If dependences exist, you must handle them manually using the synchronization directives and techniques described in Chapter 6, "Chapter 6 “Advanced shared-memory programming”." loop_private array dimensions must be determinable at compile-time. Each parallel thread of execution receives a private copy of the loop_private data object for the duration of the loop; no starting values can be assumed for the data, and unless a save_last directive or pragma is specified (as described in a following section), no ending value can be assumed. If a loop_private data object is referenced within an iteration of the loop, it must have been assigned a value previously on that same iteration. In Fortran, the LOOP_PRIVATE directive has the following form: C$DIR LOOP_PRIVATE(namelist) In C, the pragma has the form: #pragma _CNX loop_private(namelist) where
Consider the following Fortran example:
An apparent LCD on S exists in this example; if none of the IF tests are true on a given iteration, the value of S must wrap around from the previous iteration. The LOOP_PRIVATE(S) directive indicates to the compiler that S does, in fact, get assigned on every iteration, and therefore it is safe to parallelize this loop. If on any iteration none of the IF tests pass, an actual LCD exists and privatizing S will result in wrong answers. An analogous C example follows:
Because the compiler does not automatically perform variable privatization in loop_parallel loops, you must manually privatize loop data requiring privatization. This can be easily done using the loop_private directive or pragma. Consider the following Fortran example:
Here, the LOOP_PARALLEL directive is required to parallelize the I loop because of the call to MFY. The X and Y arrays are in shared memory by default. X and Z are not written to, and the portions of Y written to in the J loop's IF statement are disjoint, so these shared arrays require no special attention. The local array XMFIED, however, is written to. But because XMFIED carries no values into or out of the I loop, it can be privatized using LOOP_PRIVATE. This gives each thread running the I loop its own private copy of XMFIED, eliminating the expensive necessity of synchronized access to XMFIED. Note that a loop-carried dependence exists for XMFIED in the J loop, but because this loop runs serially on each processor, this dependence is safe. J is privatized as discussed in the section “Privatizing induction variables in nested loops”. An analogous C example follows:
To safely parallelize a loop with the loop_parallel directive or pragma, the compiler must be able to correctly determine the loop's primary induction variable. The compiler can find primary Fortran DO loop induction variables; it may, however, have trouble with DO WHILE or hand-rolled Fortran loops, and with all loop_parallel loops in C. Therefore, when you use the loop_parallel directive or pragma to manually parallelize a loop other than an explicit Fortran DO loop, you should indicate the loop's primary induction variable using the IVAR=indvar attribute to loop_parallel. Consider the following Fortran example:
This is a hand-rolled loop that uses I as its primary induction variable. To ensure parallelization, the LOOP_PARALLEL directive has been placed immediately before the start of the loop, and the induction variable, I, has been specified. Primary induction variables in C loops can be difficult for the compiler to find, so ivar is required in all loop_parallel C loops. Its use is shown in the following example:
Secondary induction variables are variables used to track loop iterations even though they do not appear in the Fortran DO statement. They cannot appear in addition to the primary induction variable in the C for statement. Such variables must be a function of the primary loop induction variable; they cannot be independent. Secondary induction variables must also either be assigned a memory class manually (as described in Chapter 5, "Chapter 5 “Memory classes”") or declared loop_private. The following Fortran example contains an incorrectly incremented secondary induction variable:
Here, J will not produce expected values in each iteration because multiple threads are overwriting its value with no synchronization. The compiler cannot privatize J because it is a loop-carried dependence (LCD). This example can be corrected by privatizing J and making it a function of I, as shown below.
Here, J will be assigned correct values on each iteration because it is a function of I, and can be safely privatized. In C, secondary induction variables are sometimes included in for statements, as shown in the following example:
Because secondary induction variables must be private to the loop and must be a function of the primary induction variable, this example cannot be safely parallelized using loop_parallel(ivar=i). In the presence of this directive, the secondary induction variable will not be recognized. To manually parallelize this loop, you must remove j from the for statement and either privatize it and make it a function of i, or declare j to be shared (which is the default storage class), specify the ordered attribute on the loop_parallel directive, and increment it within an ordered critical section inside the loop. This latter method is costly in terms of synchronization overhead and may degrade the performance of the loop. The following example demonstrates how to restructure the loop so that j is a valid secondary induction variable:
This method runs faster than placing j in a critical section because it requires no synchronization overhead, and the private copy of j used here can typically be more quickly accessed than a shared variable. The induction variables of nonparallel loops that are contained within parallel loops must be declared loop_private with respect to their closest enclosing parallel loop. Consider the following Fortran example:
Here, LOOP_PARALLEL causes the I loop to be parallelized across threads. The J loop, then, runs serially. J must be private with respect to the I loop so that the threads that run the I loop do not attempt to update the same copy of J. If the loop is automatically parallelized by the compiler, or parallelized due to the presence of a PREFER_PARALLEL directive, this privatization will be automatic. But the presence of the LOOP_PARALLEL directive requires manual privatization. An analogous C example follows:
This also applies to nested parallel outer loops. In this case, loop variables contained within a parallel construct—even if they are used in a parallel loop themselves—must be declared private with respect to the innermost enclosing parallel loop. Consider the following Fortran example:
Here, LOOP_PARALLEL is used to parallelize the I loop across hypernodes, and the J loop across processors on each hypernode. K must be declared private to the J loop to ensure that the thread-parallel threads do not interfere with each other in updating it. J must be declared private to the I loop to ensure that each node-parallel thread gets its own copy. An analogous C example follows:
The task_private directive declares a list of variables and/or arrays private to the immediately following tasks; it serves the same purpose for parallel tasks that loop_private serves for loops. The task_private directive must immediately precede or appear on the same line as its corresponding begin_tasks directive. The compiler assumes that data objects declared to be task_private have no dependences between the tasks in which they are used. If dependences exist, you must handle them manually using the synchronization directives and techniques described in Chapter 6, "Chapter 6 “Advanced shared-memory programming”." Each parallel thread of execution receives a private copy of the task_private data object for the duration of the tasks; no starting or ending values can be assumed for the data. If a task_private data object is referenced within a task, it must have been assigned a value previously in that task. In Fortran, the TASK_PRIVATE directive has the following form: C$DIR TASK_PRIVATE(namelist) In C, the pragma has the form: #pragma _CNX task_private(namelist) where
Consider the following Fortran example:
Here, the WRK array is used in the first task to temporarily hold the A array so that its order can be reversed. It serves the same purpose for the B array in the second task. WRK is assigned before it is used in each task. An analogous C example follows:
The parallel_private directive declares a list of variables and/or arrays private to the immediately following parallel region; it serves the same purpose for parallel regions that task_private serves for tasks. The parallel_private directive must immediately precede or appear on the same line as its corresponding parallel directive. Using parallel_private asserts that there are no dependences in the parallel region; do not use this directive if there are dependences. Each parallel thread of execution receives a private copy of the parallel_private data object for the duration of the region; no starting or ending values can be assumed for the data. If a parallel_private data object is referenced within a region, it must have been assigned a value previously in the region. In Fortran, the PARALLEL_PRIVATE directive has the form: C$DIR PARALLEL_PRIVATE(namelist) In C, the pragma has the form: #pragma _CNX parallel_private(namelist) where
Consider the following Fortran example:
This example is similar to the one presented in the section “Region parallelization” in the way it checks for a certain number of threads and divides up the work among those threads. However, the parallel_private variable AWORK is introduced. Each thread initializes its private copy of AWORK to the values contained in a dimension of the array A at the beginning of the parallel region; this allows the threads to reference AWORK without regard to thread ID, because no thread can access any other thread's copy of AWORK. Note that AWORK cannot carry values into or out of the region, so it must be initialized within the region. All induction variables contained in a parallel region must be privatized. Remember that the code contained in the region runs on all available threads, so failing to privatize an induction variable would allow each thread to update the same shared variable, creating indeterminate loop counts on every thread. In the J loop after AWORK is initialized, AWORK is effectively used in a reduction on A (since at this point its contents are identical to the MYTID dimension of A). After A is modified here and used in the K and L loops, each thread restores a dimension of A's original values from its private copy of AWORK, which carried the appropriate dimension through the region unaltered. An analogous C example follows:
The save_last directive and pragma allow you to save the final value of loop_private data objects assigned in the last iteration of the immediately following loop. If list (the optional, comma-separated list of loop_private data objects) is specified, only the final values of those data objects in list are saved. If list is not specified, the final values of all loop_private data objects assigned in the last loop iteration are saved. The values must be assigned in the last iteration; if the assignment is executed conditionally, it is your responsibility to ensure that the condition is met and the assignment executes. Incorrect answers can result if the assignment does not execute on the last iteration. For loop_private arrays, only those elements of the array assigned on the last iteration will be saved. In Fortran, the SAVE_LAST directive has the form: C$DIR SAVE_LAST[(list)] In C, the pragma has the form: #pragma _CNX save_last[(list)] save_last must appear immediately before or after the associated loop_private directive or pragma, or on the same line. A save_last directive or pragma causes the thread that executes the last iteration of the loop to write back the private (or local) copy of the variable into the global reference. Consider the following Fortran example:
Here, the LOOP_PRIVATE variable ATEMP is conditionally assigned in the loop; in order for ATEMP to be truly private, you must be sure that at least one of the conditions is met so that ATEMP is assigned on every iteration. When the loop terminates, the SAVE_LAST directive ensures that ATEMP and X contain the values they are assigned on the last iteration. These values can then be used later in the program. The value of Y however is not available once the loop finishes because Y is not specified as an argument to SAVE_LAST. An analogous C example follows:
Note that the save_last directive can be misleading in certain loop contexts. Consider the following Fortran example:
While it may appear that the last value of S assigned (on whatever iteration) is saved in this example, you must remember that the SAVE_LAST directive applies only to the last (Nth) iteration, without regard for any conditionals contained in the loop. For SAVE_LAST to be valid here, G(N) must be greater than 0 so that the assignment to S takes place on the final iteration. Obviously, if this condition can be predicted, the loop can be more efficiently written to exclude the IF test, so the presence of a SAVE_LAST in such a loop is suspect. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||