| United States-English |
|
|
|
![]() |
Fortran 90, Fortran 77, C, aC++: Exemplar Programming Guide > Chapter 8 Programming conventions for
optimal codeMisused memory classes |
|
While manually assigned memory classes can substantially boost performance when coupled with manual parallelization, assigning the wrong memory class to data can cause wrong answers and in some cases degrade performance. This section discusses some common misuses of memory classes. Dynamically allocating thread_private memory from serial code can give unexpected results if the memory is later accessed from parallel code. Consider the following incorrect Fortran example:
Here, the array WRONGTP is allocated, but because the allocation takes place in serial code, which is run by thread 0, only thread 0 allocates the array. When other threads attempt to access the array in the J loop, it does not exist. To fix this, allocate the array inside the thread-parallel I loop, as discussed in Chapter 5, "Chapter 5 “Memory classes”." An analogous C example follows:
In general, memory of classes other than thread_private should be dynamically allocated in serial code. Allocating node_private, near_shared, far_shared and block_shared memory from within parallel code will create wasteful redundant copies. Consider the following incorrect Fortran example:
Recall from Chapter 5, "Chapter 5 “Memory classes”," that when a node_private array is allocated, a physical copy is created on each hypernode on which the program is running. Here, each loop iteration executes the ALLOCATE statement (or memory_class_malloc function in C), thus allocating N copies of the array. This is N-1 times more copies than are actually needed. To further complicate things, node_private arrays manipulated in parallel code must be accessed by shared pointers, which is why the Fortran example includes a far_shared_pointer statement. In the code above, this pointer would be overwritten every time the I loop executed the ALLOCATE statement (or memory_class_malloc function in C), meaning that only the final copy allocated would be accessible. Since the hypernodes' execution of the loop code is not perfectly synchronized, the actual memory accessed by WRONGNP(I) would vary depending on which hypernode was last to perform the allocation. An analogous C example follows:
While dynamically allocated near_shared, far_shared and block_shared arrays do not normally require special pointer types, they suffer from the same redundant-copy problem. Allocating any shared-memory arrays from within parallel code will create as many copies of the data as there are hypernodes (or threads) executing the ALLOCATE (or memory_class_malloc) statement. As with the node_private example above, the actual memory accessed will depend on which hypernode most recently executed the ALLOCATE statement. After all hypernodes have executed the ALLOCATE, the memory allocated by all but the last will be lost. Such lost arrays are not only unusable, they cannot be deallocated. To avoid such redundancy problems, follow the allocation examples discussed in Chapter 5, "Chapter 5 “Memory classes”," and only allocate memory from within parallel constructs as described there. As mentioned in the previous section, sometimes it is necessary to access dynamically allocated arrays using pointers of different memory classes. For example, when accessing node_private arrays from node-parallel code, far_shared pointers must be used (refer to Chapter 5, "Chapter 5 “Memory classes”"). Failing to do this will render the copies of the arrays on all but logical hypernode 0 inaccessible. Consider the following incorrect Fortran example:
While the N0NP array is correctly allocated in serial code here, it is not explicitly given a shared pointer, so the arrays created will be accessed by the default node_private pointer. A physical copy of N0NP will be created on every hypernode, but the node_private pointer by which these copies are accessed will only be initialized on logical hypernode 0, because it is the only hypernode executing the ALLOCATE statement (or memory_class_malloc in C). The contents of the (node_private) pointers on other hypernodes are uninitialized and therefore indeterminate. When, in the hypernode-parallel J loop, these other hypernodes attempt to access N0NP, they will do so using the garbage contents of their uninitialized pointers, typically causing a runtime error. An analogous C example follows:
Chapter 5, "Chapter 5 “Memory classes”," covers correct pointer/data combinations and explains the situations in which nondefault pointers should be used. To avoid uninitialized pointer problems such as the one described above, follow the recommendations of Chapter 5 carefully. Improperly accessing a shared variable from parallel threads can create an unapparent dependence that can cause wrong answers. Consider the following Fortran code:
Here, the far_shared variable HOLD is updated as a function of itself in the subroutine ADDHOLD, which is called from the potentially parallel tasks. If HOLD was updated within the tasks rather than in a subroutine, the dependence would be more obvious to the programmer, who may not have ready access to ADDHOLD's source. Isolating the assignment to HOLD inside a critical section would allow the tasks to safely parallelize, whether the assignment took place in a subroutine or inside the tasks themselves. An analogous C example follows:
Always use caution when parallelizing a call to a procedure that passes the same shared variable from every thread. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||