Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Parallel Programming Guide for HP-UX Systems > Chapter 4 OpenMP

HP’s implementation of OpenMP

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

This section discusses HP’s implementation of OpenMP.

Command-line option

HP OpenMP directives are only accepted if the command-line option +Oopenmp is given.

NOTE: +Oopenmp implies +Onodynsel, +Oparallel, and +Onoautopar.

Default

The default command-line option is +Onoopenmp. If +Oopenmp is not given, all OpenMP directives (c$omp) are ignored.

Optimization levels and parallelism

+Oopenmp is accepted at all optimization levels. However, the following differences exist between Itanium®-based and PA-RISC architectures.

Itanium®-based architectures

OpenMP is accepted at all optimization levels on Itanium®-based architectures. Parallelization directives and worksharing directives can be compiled at any optimization level from +O0 through +O4. Additionally, OpenMP will interoperate with +objdebug, +noobjdebug, and -g. Therefore, source-level debugging of parallelized applications is supported subject to the same restrictions of debugging at +O2 or below.

PA-RISC architectures

The following limitations exist for OpenMP on PA-RISC architectures.

For parallel and work-shared directives (including the clauses for these directives), code is parallelized only at optimization levels +O3 or +O4. The parallel and work-shared directives are listed in Table 4-1 “Parallel and work-shared directives”.

Optimization levels +O0 through +O2

When using optimization levels +O0 through +O2:

  • All sync and run-time library directives are processed and honored.

  • Parallel and work-shared directives (including the clauses for these directives) are only processed. While they will return right answers, you will not achieve parallel code. Each thread will run a serial version of the code.

Optimization levels +O3 through +O4

When using optimization levels +O3 and +O4:

  • All sync and run-time library directives are processed and honored.

  • Parallel and work-shared directives (including the clauses for these directives) are processed and honored. The compiler will generate the parallel and work-shared code required to go parallel.

Table 4-1 Parallel and work-shared directives

Parallel / work-shared directivesOpt level acceptedOpt level required to achieve parallelism
PARALLEL+O0, +O1, +O2+O3, +O4
PARALLEL DO+O0, +O1, +O2+O3, +O4
PARALLEL SECTIONS+O0, +O1, +O2+O3, +O4
DO+O0, +O1, +O2+O3, +O4
SECTION+O0, +O1, +O2+O3, +O4
SECTIONS+O0, +O1, +O2+O3, +O4

 

Parallelism

Nested parallelism is now supported dynamically, and with the use of the NUM_THREADS clause on the Parallel directive, finer-grained control of parallelism is possible. This also allows parallelized code to use parallel versions of MLIB.

Additionally, statically-nested parallelization is enabled.

Arrays

Arrays are allowed in reduction clauses as well as scalar variables.

Portable timing routines

There are two portable timing routines:

DOUBLE PRECISION OMP_GET_WTIME()

DOUBLE PRECISION OMP_GET_WTICK()

Nested lock routines

Nested lock routines are as follows:

SUBROUTINE OMP_INIT_NEST_LOCK (NLOCK)
SUBROUTINE OMP_DESTROY_NEST_LOCK (NLOCK)
SUBROUTINE OMP_SET_NEST_LOCK (NLOCK)
SUBROUTINE OMP_UNSET_NEST_LOCK (NLOCK)
INTEGER FUNCTION OMP_TEST_NEST_LOCK (NLOCK)

Additional features

  • Copyin now allows non-threadprivate objects in a parallel region.

  • Relaxed reprivatization rules now allow an inner directive to reprivatize a variable privatized in a containing directive.

  • Privatization of module data is now allowed, as well as privatization of deferred and assumed shape objects.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.