| United States-English |
|
|
|
![]() |
HP C/HP-UX Programmer's Guide: HP-UX Systems > Chapter 8 Threads
and Parallel ProcessingOpenMP Pragmas and Options |
|
OpenMP is an industry-standard parallel programming model that implements a fork-join model of parallel execution. The HP C OpenMP pragmas included in this release are based on the OpenMP Standard for C, version 1.0. To view the details about the standard and details about usage, syntax and values, please go to http://www.openmp.org/specs. You can download either a postscript (ps) or Adobe Acrobat (PDF) version of the C/C++ Version 1.0 OpenMP standard from this website. This section details the OpenMP pragmas implemented in the previous (B.11.11.02) release and the OpenMP pragmas implemented in this (B.11.11.04) release of HP C compiler. Each of these implemented pragmas is discussed in later sections. The following OpenMP pragmas were implemented in the B.11.11.02 release. This was the result of phase I implementation of OpenMP pragmas in the HP C compiler. The following OpenMP work sharing and synchronization pragmas were implemented along with the listed clauses in this release. OpenMP work sharing pragmas:
OpenMP synchronization pragmas:
Clauses:
The following OpenMP pragmas are implemented in this release (B.11.11.04) of the HP C compiler. This was the result of phase II implementation of OpenMP pragmas. This completes the cycle of OpenMP pragmas and options implementation in HP C compiler. The following OpenMP work sharing and synchronization pragmas were implemented along with the listed clauses in this release. OpenMP work sharing pragmas:
OpenMP synchronization pragmas:
Clauses: The OpenMP driver option +Oopenmp and +Onoopenmp is added to this release of the HP C compiler.
When +Oopenmp is seen in the command line, +Onodynsel, +Oparallel, +Onofailsafe, and +Onoautopar are passed by default to the cc driver. If +Onoparallel is in effect, parallelizing transformations will not be performed.
When +Oopenmp is used, most of the HP Programming Model (HPPM) pragmas are not accepted. The following HPPM pragmas, are accepted by the HP C compiler when +Oopenmp is issued.
Using +Onoopenmp option will ignore all OpenMP directives silently. Every C program that contains OpenMP pragmas is to be compiled for the current version of HP-UX and must include the header file <omp.h>. If it does not, the OpenMP pragmas are ignored. The default path for <omp.h> is /usr/include. Install the libomp patches given below to use this header file. The OpenMP APIs are defined in the library libomp. These libraries are in patches PHSS_25028 (for HP-UX 11.00) and PHSS_25029 (for HP-UX 11.11) The _OPENMP macro name is defined by OpenMP complaint implementation as the decimal constant yyyymm, which will be the year and month of the approved specification. This macro must not be the subject of #define or #undef preprocessing directive. #ifdef_OPENMP The set of OpenMP pragmas available in HP C compiler are described in later sections. These include work sharing pragmas and synchronization pragmas along with the clauses: OpenMP work sharing pragmas are:
OpenMP synchronization pragmas are:
A directive of control data environment during execution of parallel regions is:
The parallel pragma defines a parallel region, which is a region of the program that is executed by multiple threads in parallel. This is the fundamental construct that starts parallel execution. Syntax#pragma omp parallel [clause1, clause2,...] new-line structured block where [clause1, clause2, ...] indicates that the clauses are optional. There can be zero or more clauses, where clause may be one of the following:
The for pragma identifies a construct that specifies a region in which the iterations of the associated loop should be executed in parallel. The iterations of the loop are distributed across threads that already exist. Syntax#pragma omp for [clause1,clause2, ...] newline where [clause1, clause2, ...] indicates that the clauses are optional. There can be zero or more clauses, where clause may be one of the following:
The section/sections pragmas identify a construct that specifies a set of constructs to be divided among threads in a team. Each section is executed by one of the thread in the team. Syntax#pragma omp sections [clause1, clause2, ...]new-line where [clause1, clause2, ...] indicates that the clauses are optional. There can be zero or more clauses, where clause may be one of the following:
The parallel for pragma for HP C is a shortcut for an parallel region that contains a single for pragma. Syntax#pragma omp parallel for clause1,clause2, ... new-line parallel for admits all the allowable clauses of the parallel pragma and the for pragma. The parallel sections pragma for HP C is a shortcut for specifying a parallel region containing a single sections pragma. Syntax#pragma omp parallel sections [clause1, clause2, ...]new-line parallel sections admits all the allowable clauses of the parallel pragma and the sections pragma. The private clause is supported. The single directive identifies a construct that specifies the associated structured block and is executed by only one thread in the team (not necessarily the master thread). Syntax#pragma omp single [clause[clause] . . .] new-line
The critical pragma identifies a construct that restricts the execution of the associated structured block to one thread at a time. Syntax#pragma omp critical [(name)] new-line The critical section name parameter is optional. All unnamed critical sections map to the same name which is provided by the HP C compiler. The barrier pragma synchronizes all the threads in a team. When encountered, each thread waits until all the threads in the team have reached that point. Syntax#pragma omp barrier new-line The smallest statement to contain a barrier must be a block or a compound statement. barrier is valid only inside a parallel region and outside the scope of for, section/sections, critical, ordered, and master. The ordered pragma indicates that the following structured block should be executed in the same order in which iterations will be executed in a sequential loop. Syntax#pragma omp ordered new-line An ordered directive must be within the dynamic extent of a for or a parallel for construct that has an ordered clause. When the ordered clause is used with schedule which has a chunksize, then the chunksize is ignored by the compiler. The atomic directive ensures that a specific memory location is updated atomically, rather than exposing it to the possibility of multiple simultaneous writing threads. Syntax#pragma omp atomic new-line where expression stmt must have one of the following forms:
where, in the above expressions:
The flush directive, whether explicit or implied, specifies a cross-thread sequence point at which the implementation is required to ensure that all the threads in a team have a consistent view of certain objects in the memory. Syntax#pragma omp flush [(list)] new-line A flush directive without a (list) is implied for the following directives:
The master pragma directs that the structured block following it should be executed by the master thread(thread 0) of the team. Syntax#pragma omp master new-line Other threads in the team do not execute the associated block. The threadprivate directive is provided to make file-scope variables local to a thread. The threadprivate directive makes the named file-scope or namescope-scope variables specified in the list private to a thread but file-scope visible within the thread. Syntax#pragma omp threadprivate (list) new-line The OpenMP environment variables available in HP C compiler control the execution of parallel code. The environment variable names are case sensitive and they must be in uppercase. The following environment variables are available in HP C compiler:
This environment variable applies for for and parallel for directives that have the schedule type as runtime. The schedule type and chunk size for all such loops can be set at run-time by setting this environment variable to any of the recognized schedule types and to an optional chunk_size. Syntaxsetenv OMP_SCHEDULE "dynamic" The default value of the environment variable is implementation dependent. If the optional chunk_size is set, the value must be positive. If chunk_size is not set, a value of 1 is assumed, except for static schedule. For a static schedule, the default chunk_size is set to the loop iteration space divided by a number of threads applied to the loop.
The value of the OMP_NUM_THREADS must be positive. This value depends on whether dynamic adjustment of the number of threads is enabled. If dynamic adjustments is disabled, the value of this environment variable is the number of threads to use for each parallel region until that number is explicitly changed during execution. Syntaxsetenv OMP_NUM_THREADS 16 If dynamic adjustment of the number of threads is enabled, the value of the environment variable is interpreted as the maximum number of threads to use. The OMP_DYNAMIC environment variable enables or disables dynamic adjustment of the number of threads available for execution of parallel regions. Its value must be TRUE or FALSE. Syntaxsetenv OMP_DYNAMIC TRUE If the value is set to FALSE, dynamic adjustment is disabled. If the value is set to TRUE, the number of threads that are used for executing parallel regions may be adjusted by the runtime environment to best utilize system resources. The OMP_NESTED environment variable enables or disables nested parallelism. Its value must be TRUE or FALSE. Syntaxsetenv OMP_NESTED FALSE If the value is set to TRUE, nested parallelism is enabled and if the value is set to FALSE, the nested parallelism is disabled. The default value is set to FALSE. This section describes the OpenMP C run-time library functions. The header <omp.h> declares two types: several functions that can be used to control and query the parallel execution environment, and lock functions that can be used to synchronize access to data. The type omp_lock_t is an object type capable of representing that a lock is available, or a thread owns a lock. These locks are referred as simple locks. The type omp_nest_lock_t is an object type capable of representing either that a lock is available, or both the identity of the thread that owns the lock and a nesting count. These locks are referred as nestable locks. The library functions are external functions. The descriptions of library functions are divided into the following topics:
The functions described in this section affect and monitor threads, processors, and the parallel environment:
The omp_set_num_threads function sets the number of threads to use for subsequent parallel regions. The format is as follows: #include <omp.h> The value of the parameter num_threads must be positive. Its effect depends upon whether dynamic adjustment of the number of threads is enabled. If dynamic adjustment is disabled, the value is used as the number of threads for all subsequent parallel regions prior to the next call to this function; otherwise, the value is the maximum number of threads that will be used. This function has effect only when called from serial portions of the program. If it is called from a portion of the program where the omp_in_parallel function returns non-zero, the behavior of this function is undefined. For more information on this subject, see the omp_set_dynamic and omp_get_dynamic functions.This call has precedence over the OMP_NUM_THREADS environment variable. The omp_get_num_threads function returns the number of threads currently in the team executing the parallel region from which it is called. The format is as follows: #include <omp.h> The omp_set_num_threads function and the OMP_NUM_THREADS environment variable control the number of threads in a team. If the number of threads has not been explicitly set by the user, the default is implementation dependent. This function binds to the closest enclosing parallel directive. If called from a serial portion of a program, or from a nested parallel region that is serialized, this function returns 1. The omp_get_max_threads function returns the maximum value that can be returned by calls to omp_get_num_threads. The format is as follows: #include <omp.h> If omp_set_num_threads is used to change the number of threads, subsequent calls to this function will return the new value. A typical use of this function is to determine the size of an array for which all thread numbers are valid indices, even when omp_set_dynamic is set to non-zero. This function returns the maximum value whether executing within a serial region or a parallel region. The omp_get_thread_num function returns the thread number, within its team, of the thread executing the function. The thread number lies between 0 and omp_get_num_threads()-1, inclusive. The master thread of the team is thread 0. The format is as follows: #include <omp.h> If called from a serial region, omp_get_thread_num returns 0. If called from within a nested parallel region that is serialized, this function returns 0. The omp_get_num_procs function returns the maximum number of processors that could be assigned to the program. The format is as follows: #include <omp.h> The omp_in_parallel function returns non-zero if it is called within the dynamic extent of a parallel region executing in parallel; otherwise, it returns 0. The format is as follows: #include <omp.h> This function returns non-zero from within a region executing in parallel, regardless of nested regions that are serialized. The omp_set_dynamic function enables or disables dynamic adjustment of the number of threads available for execution of parallel regions. The format is as follows: #include <omp.h> This function has effect only when called from serial portions of the program. If it is called from a portion of the program where the omp_in_parallel function returns non-zero, the behavior of the function is undefined. If dynamic_threads evaluates to non-zero, the number of threads that are used for executing subsequent parallel regions may be adjusted automatically by the run-time environment to best utilize system resources. As a consequence, the number of threads specified by the user is the maximum thread count. The number of threads always remains fixed over the duration of each parallel region and is reported by the omp_get_num_threads function. If dynamic_threads evaluates to 0, dynamic adjustment is disabled. A call to omp_set_dynamic has precedence over the OMP_DYNAMIC environment variable. The default for the dynamic adjustment of threads is implementation dependent. As a result, user codes that depend on a specific number of threads for correct execution should explicitly disable dynamic threads. Implementations are not required to provide the ability to dynamically adjust the number of threads, but they are required to provide the interface in order to support portability across all platforms. The omp_get_dynamic function returns non-zero if dynamic thread adjustments enabled and returns 0 otherwise. For a description of dynamic thread adjustment, see omp_set_dynamic. The format is as follows: #include <omp.h> If the implementation does not implement dynamic adjustment of the number of threads, this function always returns 0. The omp_set_nested function enables or disables nested parallelism. The format is as follows: #include <omp.h> If nested evaluates to 0, which is the default, nested parallelism is disabled, and nested parallel regions are serialized and executed by the current thread. If nested evaluates to non-zero, nested parallelism is enabled, and parallel regions that are nested may deploy additional threads to form the team. This call has precedence over the OMP_NESTED environment variable. When nested parallelism is enabled, the number of threads used to execute nested parallel regions is implementation dependent. As a result, OpenMP-compliant implementations are allowed to serialize nested parallel regions even when nested parallelism is enabled. The functions described in this section manipulate locks used for synchronization. For the following functions, the lock variable must have type omp_lock_t. This variable must only be accessed through these functions. All lock functions require an argument that has a pointer to omp_lock_t type.
For the following functions, the lock variable must have type omp_nest_lock_t. This variable must only be accessed through these functions. All nestable lock functions require an argument that has a pointer to omp_nest_lock_t type.
These functions provide the only means of initializing a lock. Each function initializes the lock associated with the parameter lock for use in subsequent calls. The format is as follows: #include <omp.h> The initial state is unlocked (that is, no thread owns the lock). For a nestable lock, the initial nesting count is zero. These functions ensure that the pointer to lock variable lock is uninitialized. The format is as follows: #include <omp.h> The argument to these functions must point to an initialized lock variable that is unlocked. Each of these functions blocks the thread executing the function until the specified lock is available and then sets the lock. A simple lock is available if it is unlocked. A nestable lock is available if it is unlocked or if it is already owned by the thread executing the function. The format is as follows: #include <omp.h> For a simple lock, the argument to the omp_set_lock function must point to an initialized lock variable. Ownership of the lock is granted to the thread executing the function. For a nestable lock, the argument to the omp_set_nest_lock function must point to an initialized lock variable. The nesting count is incremented, and the thread is granted, or retains, ownership of the lock. These functions provide the means of releasing ownership of a lock. The format is as follows: #include <omp.h> The argument to each of these functions must point to an initialized lock variable owned by the thread executing the function. The behavior is undefined if the thread does not own that lock. For a simple lock, the omp_unset_lock function releases the thread executing the function from ownership of the lock. For a nestable lock, the omp_unset_nest_lock function decrements the nesting count, and releases the thread executing the function from ownership of the lock if the resulting count is zero. These functions attempt to set a lock but do not block execution of the thread. The format is as follows: #include <omp.h> The argument must point to an initialized lock variable. These functions attempt to set a lock in the same manner as omp_set_lock and omp_set_nest_lock, except that they do not block execution of the thread. For a simple lock, the omp_test_lock function returns non-zero if the lock is successfully set; otherwise, it returns zero. For a nestable lock, the omp_test_nest_lock function returns the new nesting count if the lock is successfully set; otherwise, it returns zero. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||