search    
Hewlett-Packard
Parallel Programming Using OpenMP
OpenMP is an industry-standard parallel programming model that implements a fork-join model of parallel execution. The HP C++ OpenMP pragmas are based on the OpenMP Standard for C/C++, version 2.5.

Refer to http://www.openmp.org/specs for more information about the standard, usage, syntax and values.

This section is organized into the following topics:
OpenMP Implementation
This section summarizes some of the OpenMP directives behavior that are described as implementation dependent in the OpenMP v2.5 API. Each behavior is cross-referenced back to its description in the OpenMP v2.5 main specification. HP, in conformance with the OpenMP API, define and document the following behavior:
  • Maximum Number of Threads: Due to resource constraints, it is not possible for an implementation to document the maximum number of threads that can be created successfully during a program's execution. This number is dependent upon the load on the system, the amount of memory allocated by the program, and the amount of implementation dependent stack space allocated to each thread. For a 32-bit process, the stack space for each thread is allocated from the heap. The heap defaults 1 gigabyte, and the default stack size is 8 megabytes. See the linker option -N for increasing data area size to 2 gigabytes.

    If the dynamic threads mechanism is disabled, requests for additional threads will result in no additional threads being created. Programs should not assume that a request will result in additional threads for all requests. If the dynamic threads mechanism is enabled, requests for more threads than an implementation can support are satisfied by creating additional pthreads which are then scheduled by the HP-UX scheduler using a smaller number of threads (Section 2.3 of OpenMP C/C++ specs).

  • Number of processors: The number of physical processors actually hosting the threads at any time is the lesser of the number of threads or the number of physical processors on the system (Section 2.3 of OpenMP C/C++ specs).

  • Creating teams of threads: The number of threads in a team that executes a nested parallel region is dependent on the number of threads created when the application is started and the number of threads already in use. The number of threads created at application startup defaults to the number of processors on the system, but may be increased or decreased using the OMP_NUM_THREADS environment variable. If all threads are already in use when a nested parallel region is encountered, the number of threads in the team that executes the parallel region is one. If all threads are not already in use when a nested parallel region is encountered, the number of threads in the team used to execute the parallel region will be the lesser of the number of threads requested by a num_threads clause or the most omp_set_num_threads() call, if any, or the number of threads not already in use. (Section 2.3 of OpenMP C/C++ specs).

  • schedule Clause: When schedule(runtime) clause is specified, the decision regarding scheduling is deferred until run time. The schedule type and chunk size can be chosen at run time by setting the OMP_SCHEDULE environment variable. If this environment variable is not set, the resulting schedule defaults to static schedule with a chunk size of 1 (Table 2-2 of OpenMP C/C++ specs).

  • Absence of schedule Clause: In the absence of the schedule clause, the default schedule is taken from the current value of def-sched-var (Section 2.5.1.1 of OpenMP C/C++ specs).

  • ATOMIC Clause: An implementation can replace all ATOMIC directives by enclosing the statement in a critical section (Section 2.7.4 of OpenMP C/C++ specs).

    HP implements the ATOMIC clause using a slightly more efficient form of critical section roughly 60%-70% faster than critical, although there is still a runtime call.

  • omp_get_num_threads: If the number of threads has not been explicitly set by the user, the default is the number of physical processors on the system (Section 3.2.2 of OpenMP C/C++ specs).

  • omp_set_dynamic: The default for dynamic thread adjustment is 0 (disabled). (Section 3.2.7 of OpenMP C/C++ specs).

  • omp_set_nested: When nested parallelism is enabled, the number of threads used to execute nested parallel regions is determined at runtime by the underlying OpenMP parallel library (Section 3.2.9 of OpenMP C/C++ specs).

  • OMP_SCHEDULE Environment Variable: The default value for this environment variable is STATIC (Section 4.1 of OpenMP C/C++ specs).

  • OMP_NUM_THREADS Environment Variable: The default value is the number of physical processors on the system (Section 4.2 of OpenMP C/C++ specs).

  • OMP_DYNAMIC Environment Variable: The default value is FALSE (Section 4.3 of OpenMP C/C++ specs).

Environment Variables in OpenMP
The OpenMP environment variables recognized by HP aC++ compiler control the execution of parallel code.

Note: Environment variable names are case sensitive and they must be in uppercase.

The following environment variables are available in HP aC++ compiler:


OMP_SCHEDULE Environment Variable

export OMP_SCHEDULE="kind[,chunk_size]"
setenv OMP_SCHEDULE "kind[,chunk_size]"
kind is one of static, dynamic, or guided.

This environment variable applies to for and parallel for directives that have the schedule type as runtime. The schedule type and chunk size for all such loops can be set at run-time by setting this environment variable to any of the recognized schedule types and to an optional chunk_size.

The default value of the environment variable is a static schedule with a chunk_size of 1. If the optional chunk_size is set, the value must be positive. If chunk_size is not set, a value of 1 is assumed, except for static schedule. For a static schedule, the default chunk_size is set to the loop iteration space divided by a number of threads applied to the loop.

Note: OMP_SCHEDULE is ignored for for and parallel for directives that have a schedule type other than runtime.


OMP_NUM_THREADS Environment Variable

export OMP_NUM_THREADS=value
setenv OMP_NUM_THREADS value
The OMP_NUM_THREADS environment variable sets the default number of threads to use during execution. The value of OMP_NUM_THREADS must be a positive integer. Its effect depends on whether dynamic adjustment of the number of threads is enabled, and its interaction with the omp_set_num_threads library routine and any num_threads clause on a parallel directive.

The default value is the number of physical processors on the system.


OMP_DYNAMIC Environment Variable

export OMP_DYNAMIC=value
setenv OMP_DYNAMIC value
The OMP_DYNAMIC environment variable enables or disables dynamic adjustment of the number of threads available for execution of parallel regions. The value must be either TRUE or FALSE. The default value is FALSE.

If the value is set to FALSE, dynamic adjustment is disabled. If the value is set to TRUE, the number of threads that are used for executing parallel regions may be adjusted by the runtime environment to best utilize system resources.


OMP_NESTED Environment Variable

export OMP_NESTED=value
setenv OMP_NESTED value
The OMP_NESTED environment variable enables or disables nested parallelism. Its value must be TRUE or FALSE.

If the value is set to TRUE, nested parallelism is enabled and if the value is set to FALSE, the nested parallelism is disabled.

The default value is FALSE.

Runtime Library Functions in OpenMP
The OpenMP library functions are external functions. The header <omp.h> declares three types of functions:


Execution Environment Functions

The execution environment functions affect and monitor threads, processors, and the parallel environment. This section discusses the following environment functions:


omp_set_num_threads Function

#include <omp.h>
void omp_set_num_threads(int num_threads);
The omp_set_num_threads function sets the number of threads to use for subsequent parallel regions. The value of the parameter num_threads must be positive. Its effect depends upon whether dynamic adjustment of the number of threads is enabled. If dynamic adjustment is disabled, the value is used as the number of threads for all subsequent parallel regions prior to the next call to this function; otherwise, the value is the maximum number of threads that will be used. This function has effect only when called from serial portions of the program. If it is called from a portion of the program where the omp_in_parallel function returns non-zero, the behavior of this function is undefined.

For more information, see the omp_set_dynamic and omp_get_dynamic functions. This call has precedence over the OMP_NUM_THREADS environment variable.


omp_get_num_threads Function

#include <omp.h>
int omp_get_num_threads(void);
The omp_get_num_threads function returns the number of threads currently in the team executing the parallel region from which it is called. The omp_set_num_threads function and the OMP_NUM_THREADS environment variable control the number of threads in a team. If the number of threads has not been explicitly set by the user, the default is implementation dependent. This function binds to the closest enclosing parallel directive. If called from a serial portion of a program, or from a nested parallel region that is serialized, this function returns 1.


omp_get_max_threads Function

#include <omp.h>
int omp_get_max_threads(void);
The omp_get_max_threads function returns an integer that is guaranteed to be at least as large as the number of threads that would be used to form a team if a parallel region without a num_threads clause were to be encountered at that point in the code.


omp_get_thread_num Function

#include <omp.h>
int omp_get_thread_num(void);
The omp_get_thread_num function returns the thread number, within its team, of the thread executing the function. The thread number lies between 0 and omp_get_num_threads()-1. The master thread of the team is thread 0.

If called from a serial region, omp_get_thread_num returns 0. If called from within a nested parallel region that is serialized, this function returns 0.


omp_get_num_procs Function

#include <omp.h>
int omp_get_num_procs(void);
The omp_get_num_procs function returns the number of processors that are available to the program at the time the function is called.


omp_in_parallel Function

#include <omp.h>
int omp_in_parallel(void);
The omp_in_parallel function returns non-zero if it is called within the dynamic extent of a parallel region executing in parallel; otherwise, it returns 0. This function returns non-zero from within a region executing in parallel, including nested regions that are serialized.


omp_set_dynamic Function

#include <omp.h>
void omp_set_dynamic(int dynamic_threads);
The omp_set_dynamic function enables or disables dynamic adjustment of the number of threads available for execution of parallel regions. This function has effect only when called from serial portions of the program. If it is called from a portion of the program where the omp_in_parallel function returns non-zero, the behavior of the function is undefined.

If dynamic_threads evaluates to non-zero, the number of threads that are used for executing subsequent parallel regions may be adjusted automatically by the run-time environment to best utilize system resources. As a consequence, the number of threads specified by the user is the maximum thread count. The number of threads always remains fixed over the duration of each parallel region and is reported by the omp_get_num_threads function. If dynamic_threads evaluates to 0, dynamic adjustment is disabled. A call to omp_set_dynamic has precedence over the OMP_DYNAMIC environment variable.

The default for the dynamic adjustment of threads is 0.


omp_get_dynamic Function

#include <omp.h>
int omp_get_dynamic(void);
The omp_get_dynamic function returns non-zero if dynamic thread adjustment is enabled and returns 0 otherwise.


omp_set_nested Function

#include <omp.h>
void omp_set_nested(int nested);
The omp_set_nested function enables or disables nested parallelism.

If nested evaluates to 0, which is the default, nested parallelism is disabled, and nested parallel regions are serialized and executed by the current thread. If nested evaluates to non-zero, nested parallelism is enabled, and parallel regions that are nested may deploy additional threads to form the team.

This call has precedence over the OMP_NESTED environment variable.


omp_get_nested Function

#include <omp.h>
int omp_get_nested(void);
The omp_get_nested function returns non-zero if nested parallelism is enabled and 0 if it is disabled.


Lock Functions

The functions described in this section manipulate locks used for synchronization. For the following functions, the lock variable must have type omp_lock_t. This variable must only be accessed through these functions. All lock functions require an argument that has a pointer to omp_lock_t type.

For the following functions, the lock variable must have type omp_nest_lock_t. This variable must only be accessed through these functions. All nestable lock functions require an argument that has a pointer to omp_nest_lock_t type.


omp_init_lock and omp_init_nest_lock Functions

#include <omp.h>
void omp_init_lock(omp_lock_t *lock);
void omp_init_nest_lock(omp_nest_lock_t *lock);
These functions provide the only means of initializing a lock. Each function initializes the lock associated with the parameter lock for use in subsequent calls. The initial state is unlocked (that is, no thread owns the lock).

For a nestable lock, the initial nesting count is zero.


omp_destroy_lock and omp_destroy_nest_lock Functions

#include <omp.h>
void omp_destroy_lock(omp_lock_t *lock);
void omp_destroy_nest_lock(omp_nest_lock_t *lock);
These functions ensure that the pointed to lock variable lock is uninitialized. The argument to these functions must point to an initialized lock variable that is locked.


omp_set_lock and omp_set_nest_lock Functions

#include <omp.h>
void omp_set_lock(omp_lock_t *lock);
void omp_set_nest_lock(omp_nest_lock_t *lock);
Each of these functions blocks the thread executing the function until the specified lock is available and then sets the lock. A simple lock is available if it is unlocked. A nestable lock is available if it is unlocked or if it is already owned by the thread executing the function.

For a simple lock, the argument to the omp_set_lock function must point to an initialized lock variable. Ownership of the lock is granted to the thread executing the function.

For a nestable lock, the argument to the omp_set_nest_lock function must point to an initialized lock variable. The nesting count is incremented, and the thread is granted, or retains, ownership of the lock.


omp_unset_lock and omp_unset_nest_lock Functions

#include <omp.h>
void omp_unset_lock(omp_lock_t *lock);
void omp_unset_nest_lock(omp_nest_lock_t *lock);
These functions provide the means of releasing ownership of a lock. The argument to each of these functions must point to an initialized lock variable owned by the thread executing the function. The behavior is undefined if the thread does not own that lock.

For a simple lock, the omp_unset_lock function releases the thread executing the function from ownership of the lock.

For a nestable lock, the omp_unset_nest_lock function decrements the nesting count, and releases the thread executing the function from ownership of the lock if the resulting count is zero.


omp_test_lock and omp_test_nest_lock Functions

#include <omp.h>
int omp_test_lock(omp_lock_t *lock);
int omp_test_nest_lock(omp_nest_lock_t *lock);
These functions attempt to set a lock but do not block execution of the thread. The argument must point to an initialized lock variable. These functions attempt to set a lock in the same manner as omp_set_lock and omp_set_nest_lock, except that they do not block execution of the thread.

For a simple lock, the omp_test_lock function returns non-zero if the lock is successfully set; otherwise, it returns zero.

For a nestable lock, the omp_test_nest_lock function returns the new nesting count if the lock is successfully set; otherwise, it returns zero.


Timing Functions

The following functions support a portable wall-clock timer:


omp_get_wtime Function

#include <omp.h>
double omp_get_wtime(void);
The omp_get_wtime function returns a double-precision floating point value equal to the elapsed wall clock time in seconds since some time in the past. The actual time in the past is arbitrary, but it is guaranteed not to change during the execution of the application program.

The function may be used to measure elapsed times as shown in the following example:

double start;
double end;
start = omp_get_wtime();
... work to be timed ...
end = omp_get_wtime();
printf("Work took %f sec. time.\n", end-start);
The time returned are per-thread times. They are not required to be globally consistent across all the threads participating in an application.


omp_get_wtick Function

#include <omp.h>
double omp_get_wtick(void);
The omp_get_wtick function returns a double-precision floating point value equal to the number of seconds between successive clock ticks.