| United States-English |
|
|
|
![]() |
Parallel Programming Guide for HP-UX Systems: K-Class and V-Class Servers > Chapter 9 Parallel programming
techniquesParallelizing regions |
|
A parallel region is a single block of code that is written to run replicated on several threads. Certain scalar code within the parallel region is run by each thread in preparation for work-sharing parallel constructs such as prefer_parallel(dist), loop_parallel(dist), or begin_tasks(dist). The scalar code typically assigns data into parallel_private variables so that subsequent references to the data have a high cache hit rate. Within a parallel region, code execution can be restricted to subsets of threads by using conditional blocks that test the thread ID. Region parallelization differs from task parallelization in that parallel tasks are separate, contiguous blocks of code. When parallelized using the tasking directives and pragmas, each block generally runs on a separate thread. This is in comparison to a single parallel region, which runs on several threads. Specifying parallel tasks is also typically less time consuming
because each thread's work is implicitly defined by the
task boundaries. In region parallelization, you must manually modify
the region to identify The beginning of a parallel region is denoted by the parallel directive or pragma. The end is denoted by the end_parallel directive or pragma. end_parallel also prevents execution from continuing until all copies of the parallel region have completed. Within a parallel region, the compiler does not check for data dependences, perform variable privatization, or perform parallelization analysis. You must manually synchronize any dependences between copies of the region and manually privatize data as necessary. In the absence of a threads attribute, parallel defaults to thread parallelization. The forms of the regional parallelization directives and pragmas are shown in Table 9-10 “Forms of region parallelization directives and pragmas”. Table 9-10 Forms of region parallelization directives and pragmas
The optional attribute-list can contain one of the following attributes (m is an integer constant). Table 9-11 Attributes for region parallelization
Region parallelization The following Fortran example provides an implementation of region parallelization using the PARALLEL directive:
In this example, all arrays written to in the parallel code have one dimension for each of the anticipated number of parallel threads. Each thread can work on disjoint data, there is no chance of two threads attempting to update the same element, and, therefore, there is no need for explicit synchronization. The RDONLY array is one-dimensional, but it is never written to by parallel threads. Before the parallel region, RDONLY is initialized in serial code. The PARALLEL_PRIVATE directive is used to privatize the induction variables used in the parallel region. This must be done so that the various threads processing the region do not attempt to write to the same shared induction variables. PARALLEL_PRIVATE is covered in more detail in the section “parallel_private”. At the beginning of the parallel region, the NUM_THREADS() intrinsic is called to ensure that the expected number of threads are available. Then the MY_THREAD() intrinsic, is called by each thread to determine its thread ID. All subsequent code in the region is executed based on this ID. In the I loop, each thread computes one row of A using RDONLY and the corresponding row of B. RDONLY is reinitialized in a subroutine call that is only executed by thread 0 before it is used again in the computation of B in the J loop. In J, each thread computes a row again. The J loop similarly computes C. Finally, the K loop sums each dimension of A, B, and C into the SUM array. No synchronization is necessary here because each thread is running the entire loop serially and assigning into a discrete element of SUM. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||