Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP C/HP-UX Programmer's Guide: HP-UX Systems > Chapter 8 Threads and Parallel Processing

OpenMP Pragmas and Options

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Index

OpenMP is an industry-standard parallel programming model that implements a fork-join model of parallel execution. The HP C OpenMP pragmas included in this release are based on the OpenMP Standard for C, version 1.0.

To view the details about the standard and details about usage, syntax and values, please go to http://www.openmp.org/specs. You can download either a postscript (ps) or Adobe Acrobat (PDF) version of the C/C++ Version 1.0 OpenMP standard from this website.

OpenMP pragmas

This section details the OpenMP pragmas implemented in the previous (B.11.11.02) release and the OpenMP pragmas implemented in this (B.11.11.04) release of HP C compiler. Each of these implemented pragmas is discussed in later sections.

OpenMP pragmas in AR0301 Release

The following OpenMP pragmas were implemented in the B.11.11.02 release. This was the result of phase I implementation of OpenMP pragmas in the HP C compiler. The following OpenMP work sharing and synchronization pragmas were implemented along with the listed clauses in this release.

OpenMP work sharing pragmas:

  • parallel

  • for

  • sections

  • section

  • parallel for

  • parallel sections

OpenMP synchronization pragmas:

  • critical

  • barrier

  • ordered

Clauses:

  • private

  • default

  • shared

OpenMP pragmas in this Release

The following OpenMP pragmas are implemented in this release (B.11.11.04) of the HP C compiler. This was the result of phase II implementation of OpenMP pragmas. This completes the cycle of OpenMP pragmas and options implementation in HP C compiler. The following OpenMP work sharing and synchronization pragmas were implemented along with the listed clauses in this release.

OpenMP work sharing pragmas:

  • single

OpenMP synchronization pragmas:

  • flush

  • atomic

  • master

  • nowait

Clauses:

  • firstprivate

  • if

  • copyin

  • schedule

  • reduction

  • lastprivate

+O[no]openmp command line option

The OpenMP driver option +Oopenmp and +Onoopenmp is added to this release of the HP C compiler.

NOTE: The +Oopenmp option is accepted at all optimization levels. However, most of the OpenMP pragmas need a minimum optimization level of +O3. To ensure that OpenMP pragmas are recognized, you must specify +O3 on the command line.

When +Oopenmp is seen in the command line, +Onodynsel, +Oparallel, +Onofailsafe, and +Onoautopar are passed by default to the cc driver. If +Onoparallel is in effect, parallelizing transformations will not be performed.

NOTE: +Oopenmp overrides +Odynsel, +Ofailsafe, and +Onoparallel.

When +Oopenmp is used, most of the HP Programming Model (HPPM) pragmas are not accepted. The following HPPM pragmas, are accepted by the HP C compiler when +Oopenmp is issued.

  • BLOCK_LOOP

  • NO_BLOCK_LOOP

  • NO_DISTRIBUTE

  • NO_DYNSEL

  • NO_PARALLEL

  • NO_LOOP_DEPENDENCE

  • NO_LOOP_TRANSFORM

  • NO_UNROLL_AND_JAM

  • OPTIONS

  • SCALAR

  • UNROLL_AND_JAM

Using +Onoopenmp option will ignore all OpenMP directives silently.

New header file

Every C program that contains OpenMP pragmas is to be compiled for the current version of HP-UX and must include the header file <omp.h>. If it does not, the OpenMP pragmas are ignored. The default path for <omp.h> is /usr/include. Install the libomp patches given below to use this header file.

New OpenMP library

The OpenMP APIs are defined in the library libomp. These libraries are in patches PHSS_25028 (for HP-UX 11.00) and PHSS_25029 (for HP-UX 11.11)

OpenMP macro _OPENMP

The _OPENMP macro name is defined by OpenMP complaint implementation as the decimal constant yyyymm, which will be the year and month of the approved specification. This macro must not be the subject of #define or #undef preprocessing directive.

#ifdef_OPENMP
iam = omp_get_thread_num() + index;
#endif

C pragmas for OpenMP

The set of OpenMP pragmas available in HP C compiler are described in later sections. These include work sharing pragmas and synchronization pragmas along with the clauses:

OpenMP work sharing pragmas are:

  • parallel

  • for

  • sections

  • section

  • parallel for

  • parallel sections

  • single

OpenMP synchronization pragmas are:

  • critical

  • barrier

  • ordered

  • flush

  • atomic

  • master

  • nowait

A directive of control data environment during execution of parallel regions is:

  • threadprivate

parallel construct

The parallel pragma defines a parallel region, which is a region of the program that is executed by multiple threads in parallel. This is the fundamental construct that starts parallel execution.

Syntax

#pragma omp parallel [clause1, clause2,...] new-line structured block

where [clause1, clause2, ...] indicates that the clauses are optional. There can be zero or more clauses, where clause may be one of the following:

  • private(list)

    private declares the variables in the (list) to be private to each thread in a team. A new object with automatic storage duration is allocated within the associated structured block.

  • default(shared | none)

    Specifying default(shared) is equivalent to explicitly listing each currently visible variable in a shared clause. A variable referenced in the scope of default(none) should be explicitly qualified by a private or shared clause.

  • shared(list)

    The shared clause shares the variables that appear in the list to be shared among all threads in a team. All threads within a team access the same storage area for the shared variables

  • reduction(operator:list)

    This clause performs a reduction on the scalar variables that appear in the list, with the operator op. The syntax of the reduction is:

    reduction(op:list)

  • lastprivate(list)

    When lastprivate is specified in a loop or section, the value of the lastprivate variable from either the sequentially last iteration of the associated loop, or the lexically last section directive is assigned to the variable's original object.

  • schedule(kind[.chunksize])

    The schedule clause specifies how iterations of the for loop are divided among threads of the team. The kind of schedule can be: static, dynamic, guided, and runtime. chunksize should be an integer constant. Expressions in the place of chunksize are not currently supported.

  • if(sclar-expression)

    if(expr) is one of the clauses that can be used along with #pragma omp parallel. The associated block of code will be executed in parallel if the (expr) evaluates to a non-zero value, else no parallelization happens and it is executed sequentially.Example:

    #pragma omp parallel private(x) if (a>b) reduction(+:p)
    {
    // code to be conditionally parallelized
    }

  • firstprivate(list)

    The firstprivate clause provides a superset of the functionality provided by the private clause.

    Variables specified in the list have private clause semantics described earlier. The new private object is initialized, as if there is an implied declaration inside the structured block and the initializer is the value of the original object.

    NOTE: private and firstprivate do not work for globals and aggregate types.
  • copyin(list)

    copyin should copy the master thread copy of a threadprivate variable to all other threads at the beginning of the parallel region. This clause can only be used with the parallel directive.

for construct

The for pragma identifies a construct that specifies a region in which the iterations of the associated loop should be executed in parallel. The iterations of the loop are distributed across threads that already exist.

Syntax

#pragma omp for [clause1,clause2, ...] newline
for-loop

where [clause1, clause2, ...] indicates that the clauses are optional. There can be zero or more clauses, where clause may be one of the following:

  • lastprivate(list)

  • reduction(operator:list)

  • ordered

  • schedule(kind[,chunksize])

NOTE: chunksize should be an integer constant. Expressions in place of chunksize are not supported and chunksize is of: static, dynamic, guided, or runtime types.
  • firstprivate

  • private

  • nowait

    The nowait pragma removes the implicit barrier synchronization at the end of a for or sections construct. Its syntax is:

    #pragma omp for nowait

section/sections construct

The section/sections pragmas identify a construct that specifies a set of constructs to be divided among threads in a team. Each section is executed by one of the thread in the team.

Syntax

#pragma omp sections [clause1, clause2, ...]new-line
{
#pragma omp section new-line
structured-block

#pragma omp section new-line
structured-block

.
}

where [clause1, clause2, ...] indicates that the clauses are optional. There can be zero or more clauses, where clause may be one of the following:

  • lastprivate

  • reduction(operator:list)

  • private

  • firstprivate

  • nowait

parallel for construct

The parallel for pragma for HP C is a shortcut for an parallel region that contains a single for pragma.

Syntax

#pragma omp parallel for clause1,clause2, ... new-line
for-loop

parallel for admits all the allowable clauses of the parallel pragma and the for pragma.

parallel sections construct

The parallel sections pragma for HP C is a shortcut for specifying a parallel region containing a single sections pragma.

Syntax

#pragma omp parallel sections [clause1, clause2, ...]new-line
{
[#pragma omp section new-line
structured-block

[#pragma omp section new-line
structured-block

.
}

parallel sections admits all the allowable clauses of the parallel pragma and the sections pragma. The private clause is supported.

single construct

The single directive identifies a construct that specifies the associated structured block and is executed by only one thread in the team (not necessarily the master thread).

Syntax

#pragma omp single [clause[clause] . . .] new-line
structured-block

NOTE: Currently single directive does not support any clauses.

critical construct

The critical pragma identifies a construct that restricts the execution of the associated structured block to one thread at a time.

Syntax

#pragma omp critical [(name)] new-line
structured-block

The critical section name parameter is optional. All unnamed critical sections map to the same name which is provided by the HP C compiler.

barrier construct

The barrier pragma synchronizes all the threads in a team. When encountered, each thread waits until all the threads in the team have reached that point.

Syntax

#pragma omp barrier new-line

The smallest statement to contain a barrier must be a block or a compound statement. barrier is valid only inside a parallel region and outside the scope of for, section/sections, critical, ordered, and master.

ordered construct

The ordered pragma indicates that the following structured block should be executed in the same order in which iterations will be executed in a sequential loop.

Syntax

#pragma omp ordered new-line
structured-block

An ordered directive must be within the dynamic extent of a for or a parallel for construct that has an ordered clause. When the ordered clause is used with schedule which has a chunksize, then the chunksize is ignored by the compiler.

atomic construct

The atomic directive ensures that a specific memory location is updated atomically, rather than exposing it to the possibility of multiple simultaneous writing threads.

Syntax

#pragma omp atomic new-line
expression stmt

where expression stmt must have one of the following forms:

  • x binop = expr

  • x++

  • ++x

  • x--

  • --x

where, in the above expressions:

  • x, is an lvalue expression with scalar type.

  • expr, is an expression with scalar type, and it does not reference the object designated by x.

flush construct

The flush directive, whether explicit or implied, specifies a cross-thread sequence point at which the implementation is required to ensure that all the threads in a team have a consistent view of certain objects in the memory.

Syntax

#pragma omp flush [(list)] new-line

A flush directive without a (list) is implied for the following directives:

  • barrier

  • an entry to and exit from critical

  • at entry to and exit from ordered

  • at exit from parallel

  • at exit from for

  • at exit from sections

  • at exit from single

NOTE: The directive is not implied if a nowait clause is present.

master construct

The master pragma directs that the structured block following it should be executed by the master thread(thread 0) of the team.

Syntax

#pragma omp master new-line
structured block

Other threads in the team do not execute the associated block.

threadprivate construct

The threadprivate directive is provided to make file-scope variables local to a thread. The threadprivate directive makes the named file-scope or namescope-scope variables specified in the list private to a thread but file-scope visible within the thread.

Syntax

#pragma omp threadprivate (list) new-line

Caveats

Observe these known restrictions while using OpenMP pragmas:

  • Arrays in firstprivate clause is treated as private.

  • The firstprivate, private, and nowait and clauses are not supported with #pragma omp single.

Environment Variables in OpenMP

The OpenMP environment variables available in HP C compiler control the execution of parallel code. The environment variable names are case sensitive and they must be in uppercase.

The following environment variables are available in HP C compiler:

  • OMP_SCHEDULE

  • OMP_NUM_THREADS

  • OMP_DYNAMIC

  • OMP_NESTED

OMP_SCHEDULE

This environment variable applies for for and parallel for directives that have the schedule type as runtime. The schedule type and chunk size for all such loops can be set at run-time by setting this environment variable to any of the recognized schedule types and to an optional chunk_size.

Syntax

setenv OMP_SCHEDULE "dynamic"

The default value of the environment variable is implementation dependent. If the optional chunk_size is set, the value must be positive. If chunk_size is not set, a value of 1 is assumed, except for static schedule. For a static schedule, the default chunk_size is set to the loop iteration space divided by a number of threads applied to the loop.

NOTE: OMP_SCHEDULE is ignored for for and parallel for directives that have a schedule type other than runtime.

OMP_NUM_THREADS

The value of the OMP_NUM_THREADS must be positive. This value depends on whether dynamic adjustment of the number of threads is enabled. If dynamic adjustments is disabled, the value of this environment variable is the number of threads to use for each parallel region until that number is explicitly changed during execution.

Syntax

setenv OMP_NUM_THREADS 16

If dynamic adjustment of the number of threads is enabled, the value of the environment variable is interpreted as the maximum number of threads to use.

OMP_DYNAMIC

The OMP_DYNAMIC environment variable enables or disables dynamic adjustment of the number of threads available for execution of parallel regions. Its value must be TRUE or FALSE.

Syntax

setenv OMP_DYNAMIC TRUE

If the value is set to FALSE, dynamic adjustment is disabled.

If the value is set to TRUE, the number of threads that are used for executing parallel regions may be adjusted by the runtime environment to best utilize system resources.

OMP_NESTED

The OMP_NESTED environment variable enables or disables nested parallelism. Its value must be TRUE or FALSE.

Syntax

setenv OMP_NESTED FALSE

If the value is set to TRUE, nested parallelism is enabled and if the value is set to FALSE, the nested parallelism is disabled. The default value is set to FALSE.

Runtime Library Functions

This section describes the OpenMP C run-time library functions. The header <omp.h> declares two types: several functions that can be used to control and query the parallel execution environment, and lock functions that can be used to synchronize access to data.

The type omp_lock_t is an object type capable of representing that a lock is available, or a thread owns a lock. These locks are referred as simple locks.

The type omp_nest_lock_t is an object type capable of representing either that a lock is available, or both the identity of the thread that owns the lock and a nesting count. These locks are referred as nestable locks.

The library functions are external functions.

The descriptions of library functions are divided into the following topics:

  • Execution environment functions

  • Lock functions

Execution environment functions

The functions described in this section affect and monitor threads, processors, and the parallel environment:

  • omp_set_num_threads

  • omp_get_num_threads

  • omp_get_max_threads

  • omp_get_thread_num

  • omp_get_num_procs

  • omp_in_parallel

  • omp_set_dynamic

  • omp_get_dynamic

  • omp_set_nested

  • omp_get_nested

omp_set_num_threads

The omp_set_num_threads function sets the number of threads to use for subsequent parallel regions. The format is as follows:

#include <omp.h>
void omp_set_num_threads(int num_threads);

The value of the parameter num_threads must be positive. Its effect depends upon whether dynamic adjustment of the number of threads is enabled. If dynamic adjustment is disabled, the value is used as the number of threads for all subsequent parallel regions prior to the next call to this function; otherwise, the value is the maximum number of threads that will be used. This function has effect only when called from serial portions of the program. If it is called from a portion of the program where the omp_in_parallel function returns non-zero, the behavior of this function is undefined. For more information on this subject, see the omp_set_dynamic and omp_get_dynamic functions.This call has precedence over the OMP_NUM_THREADS environment variable.

omp_get_num_threads

The omp_get_num_threads function returns the number of threads currently in the team executing the parallel region from which it is called. The format is as follows:

#include <omp.h>
int omp_get_num_threads(void);

The omp_set_num_threads function and the OMP_NUM_THREADS environment variable control the number of threads in a team.

If the number of threads has not been explicitly set by the user, the default is implementation dependent. This function binds to the closest enclosing parallel directive. If called from a serial portion of a program, or from a nested parallel region that is serialized, this function returns 1.

omp_get_max_threads

The omp_get_max_threads function returns the maximum value that can be returned by calls to omp_get_num_threads. The format is as follows:

#include <omp.h>
int omp_get_max_threads(void);

If omp_set_num_threads is used to change the number of threads, subsequent calls to this function will return the new value. A typical use of this function is to determine the size of an array for which all thread numbers are valid indices, even when omp_set_dynamic is set to non-zero.

This function returns the maximum value whether executing within a serial region or a parallel region.

omp_get_thread_num

The omp_get_thread_num function returns the thread number, within its team, of the thread executing the function. The thread number lies between 0 and omp_get_num_threads()-1, inclusive. The master thread of the team is thread 0. The format is as follows:

#include <omp.h>
int omp_get_thread_num(void);

If called from a serial region, omp_get_thread_num returns 0. If called from within a nested parallel region that is serialized, this function returns 0.

omp_get_num_procs

The omp_get_num_procs function returns the maximum number of processors that could be assigned to the program. The format is as follows:

#include <omp.h>
int omp_get_num_procs(void);

omp_in_parallel

The omp_in_parallel function returns non-zero if it is called within the dynamic extent of a parallel region executing in parallel; otherwise, it returns 0. The format is as follows:

#include <omp.h>
int omp_in_parallel(void);

This function returns non-zero from within a region executing in parallel, regardless of nested regions that are serialized.

omp_set_dynamic

The omp_set_dynamic function enables or disables dynamic adjustment of the number of threads available for execution of parallel regions. The format is as follows:

#include <omp.h>
void omp_set_dynamic(int dynamic_threads);

This function has effect only when called from serial portions of the program. If it is called from a portion of the program where the omp_in_parallel function returns non-zero, the behavior of the function is undefined. If dynamic_threads evaluates to non-zero, the number of threads that are used for executing subsequent parallel regions may be adjusted automatically by the run-time environment to best utilize system resources. As a consequence, the number of threads specified by the user is the maximum thread count. The number of threads always remains fixed over the duration of each parallel region and is reported by the omp_get_num_threads function.

If dynamic_threads evaluates to 0, dynamic adjustment is disabled.

A call to omp_set_dynamic has precedence over the OMP_DYNAMIC environment variable.

The default for the dynamic adjustment of threads is implementation dependent. As a result, user codes that depend on a specific number of threads for correct execution should explicitly disable dynamic threads. Implementations are not required to provide the ability to dynamically adjust the number of threads, but they are required to provide the interface in order to support portability across all platforms.

omp_get_dynamic

The omp_get_dynamic function returns non-zero if dynamic thread adjustments enabled and returns 0 otherwise. For a description of dynamic thread adjustment, see omp_set_dynamic. The format is as follows:

#include <omp.h>
int omp_get_dynamic(void);

If the implementation does not implement dynamic adjustment of the number of threads, this function always returns 0.

omp_set_nested

The omp_set_nested function enables or disables nested parallelism. The format is as follows:

#include <omp.h>
void omp_set_nested(int nested);

If nested evaluates to 0, which is the default, nested parallelism is disabled, and nested parallel regions are serialized and executed by the current thread. If nested evaluates to non-zero, nested parallelism is enabled, and parallel regions that are nested may deploy additional threads to form the team.

This call has precedence over the OMP_NESTED environment variable.

When nested parallelism is enabled, the number of threads used to execute nested parallel regions is implementation dependent. As a result, OpenMP-compliant implementations are allowed to serialize nested parallel regions even when nested parallelism is enabled.

omp_get_nested

The omp_get_nested function returns non-zero if nested parallelism is enabled and 0 if it is disabled. The format is as follows:

#include <omp.h>
int omp_get_nested(void);

If an implementation does not implement nested parallelism, this function always returns 0.

Lock functions

The functions described in this section manipulate locks used for synchronization.

For the following functions, the lock variable must have type omp_lock_t. This variable must only be accessed through these functions. All lock functions require an argument that has a pointer to omp_lock_t type.

  • omp_init_lock

  • omp_destroy_lock

  • omp_set_lock

  • omp_unset_lock

  • omp_test_lock

For the following functions, the lock variable must have type omp_nest_lock_t. This variable must only be accessed through these functions. All nestable lock functions require an argument that has a pointer to omp_nest_lock_t type.

  • omp_init_nest_lock

  • omp_destroy_nest_lock

  • omp_set_nest_lock

  • omp_unset_nest_lock

  • omp_test_nest_lock

omp_init_lock and omp_init_nest_lock Functions

These functions provide the only means of initializing a lock. Each function initializes the lock associated with the parameter lock for use in subsequent calls. The format is as follows:

#include <omp.h>
void omp_init_lock(omp_lock_t *lock);
void omp_init_nest_lock(omp_nest_lock_t *lock);

The initial state is unlocked (that is, no thread owns the lock). For a nestable lock, the initial nesting count is zero.

omp_destroy_lock and omp_destroy_nest_lock Functions

These functions ensure that the pointer to lock variable lock is uninitialized. The format is as follows:

#include <omp.h>
void omp_destroy_lock(omp_lock_t *lock);
void omp_destroy_nest_lock(omp_nest_lock_t *lock);

The argument to these functions must point to an initialized lock variable that is unlocked.

omp_set_lock and omp_set_nest_lock Functions

Each of these functions blocks the thread executing the function until the specified lock is available and then sets the lock. A simple lock is available if it is unlocked. A nestable lock is available if it is unlocked or if it is already owned by the thread executing the function. The format is as follows:

#include <omp.h>
void omp_set_lock(omp_lock_t *lock);
void omp_set_nest_lock(omp_nest_lock_t *lock);

For a simple lock, the argument to the omp_set_lock function must point to an initialized lock variable. Ownership of the lock is granted to the thread executing the function.

For a nestable lock, the argument to the omp_set_nest_lock function must point to an initialized lock variable. The nesting count is incremented, and the thread is granted, or retains, ownership of the lock.

omp_unset_lock and omp_unset_nest_lock Functions

These functions provide the means of releasing ownership of a lock. The format is as follows:

#include <omp.h>
void omp_unset_lock(omp_lock_t *lock);
void omp_unset_nest_lock(omp_nest_lock_t *lock);

The argument to each of these functions must point to an initialized lock variable owned by the thread executing the function. The behavior is undefined if the thread does not own that lock.

For a simple lock, the omp_unset_lock function releases the thread executing the function from ownership of the lock.

For a nestable lock, the omp_unset_nest_lock function decrements the nesting count, and releases the thread executing the function from ownership of the lock if the resulting count is zero.

omp_test_lock and omp_test_nest_lock Functions

These functions attempt to set a lock but do not block execution of the thread. The format is as follows:

#include <omp.h>
int omp_test_lock(omp_lock_t *lock);
int omp_test_nest_lock(omp_nest_lock_t *lock);

The argument must point to an initialized lock variable. These functions attempt to set a lock in the same manner as omp_set_lock and omp_set_nest_lock, except that they do not block execution of the thread.

For a simple lock, the omp_test_lock function returns non-zero if the lock is successfully set; otherwise, it returns zero.

For a nestable lock, the omp_test_nest_lock function returns the new nesting count if the lock is successfully set; otherwise, it returns zero.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 2003 Hewlett-Packard Development Company, L.P.