Using Threads

The HP aC++ run-time environment supports multi-threaded applications. The following HP aC++ libraries are thread-safe with the limitations cited below.

Rogue Wave Standard C++ Library 2.2.1 Rogue Wave Standard C++ Library 1.2.1 and Tools.h++ 7.0.6
For both 32-bit and 64-bit libraries:
  • libstd_v2.so and libstd_v2.a
  • libCsup_v2.so and libCsup_v2.a
For both 32-bit and 64-bit libraries:
  • libstd.so and libstd.a
  • librwtool.so and librwtool.a
  • libCsup.so and libCsup.a
  • libstream.so and libstream.a

Using Locks

To guarantee that your I/O results from one thread are not intermingled with I/O results from other threads, you must protect your I/O statements with locks. For example:

// create a mutex and initialize it
pthread_mutex_t the_mutex;
#ifdef _PTHREADS_DRAFT4     // for user threads
pthread_mutex_init(&the_mutex, pthread_mutexattr_default);
#else                       // for kernel threads
pthread_mutex_init(&the_mutex, (pthread_mutexattr_t *)NULL);
#endif

pthread_mutex_lock(&the_mutex);
cout << "something" ... ;
pthread_mutex_unlock(&the_mutex);

Note that conditional compilation may be necessary to accommodate both the user threads and the kernel threads interfaces, as in the above example. An alternative might be to compose a buffer with an ostrstream and output with one write. The following example could be used with the cfront compatible libstream.

ostrstream ostr;
ostr << "something" /*...*/ ;
ostr << " or another" /*...*/ << endl;
cout.write(ostr.str(), ostr.pcount());
ostr.rdbuf()->freeze(0);
Note that the above example works with with the new library, though with the deprecated ostrstream.

Or something similar can be done with the Rogue Wave Standard C++ Library 2.2.1 (libstd_v2) with standard ostringstream, as in the following example:

ostringstream ostr;
ostr << "something" /*...*/ ;
ostr << " or another" /*...*/ << endl;
cout.write(ostr.str().c_str(), ostr.str().length());
Note that cout.flush() may be needed if sharing the file with stdio.

Required Command-line Options

To use the multi-thread safe capabilities of the Standard C++ Library, you need to specify the following options at both compile and link time. Note that the options differ depending on which set of libraries you are using.

Rogue Wave Standard C++ Library 2.2.1 Rogue Wave Standard C++ Library 1.2.1 and Tools.h++ 7.0.6
For both 32-bit and 64-bit libraries:
  • -D_RWSTD_MULTI_THREAD
  • -D_REENTRANT
  • -lpthread (This option applies only to kernel threads.)
  • -mt
For both 32-bit and 64-bit libraries:
  • -D__HPACC_THREAD_SAFE_RB_TREE (Code compiled with this option is binary incompatible with code that is not compiled with this option. Only HP aC++ version A.01.21 and subsequent versions incorporate this option.)
  • -DRWSTD_MULTI_THREAD
  • -DRW_MULTI_THREAD (needed only for the Tools.h++ Library)
  • -D_REENTRANT
  • -lcma (This option applies only to user threads.)
  • -lpthread (This option applies only to kernel threads.)
  • -D_THREAD_SAFE (Unlike the other options in this table, this option is not required. You can use it with the cout, cin, cerr, and clog objects, if you are not using locks.)
  • -mt

WARNING: If you do not specify these options as described in the above table, a run-time error will be generated or multi-thread behavior will be incorrect.


Limitations

In most cases, thread safety does not imply that the same object can be shared between threads. In particular, when objects have user visible state, it would not make sense to share them between threads. Consider the following:
void f(ostream &out, int x, int y) {
       out << setw(3) << x << setw(10) << y;
}
This function would not be thread safe if called from multiple threads with the same object, since the "width" in the shared object could be changed at any time. Therefore, such objects are not protected from interactions between multiple threads, and the result of sharing such an object between threads is undefined.

If the same object is shared between threads, a runtime crash, abort, or intermingled output may occur. With the Rogue Wave Standard C++ Library 2.2.1, output may be intermingled but no aborts will occur.

Using -D_THREAD_SAFE with the cfront Compatible libstream

There is an exception to the above rule for the cfront compatible libstream. For the frequently used objects cout, cin, cerr, and clog, you can specify the -D_THREAD_SAFE compile time flag for any file that includes <iostream.h>. In this case, a new instance of the object is transparently created for each thread that uses it. All instances share the same file descriptor. The f() function in the above example will now work, because it receives one new "out" object per thread. However, the results of two simultaneous executions of f() will be mixed in any order in the output.

Using -D_THREAD_SAFE with the global scope operator is not supported for cout, cin, cerr, and clog. For example, the following code would generate an error:

::cout << endl;

Note, if you use locks, you need not use the -D_THREAD_SAFE compile time flag since you are now responsible for ensuring thread safety.

Differences between Standard iostreams and cfront Compatible libstream

The cfront compatible libstream supports locking for each insertion. Rogue Wave Standard C++ Library 1.2.1 and Tools.h++ 7.0.6 do not support locking but do provide a thread private buffer.

Visible differences would be as follows. In the case of standard iostreams, there is intermingling of each component being inserted. With cfront compatible iostreams, there is intermingling of complete buffers (depending on when endl or flush is called).


Using -D__HPACC_THREAD_SAFE_RB_TREE

The Rogue Wave Standard C++ Library 1.2.1 (libstd) and Tools.h++ 7.0.6 (librwtool) are not thread safe if the underlying implementation rb_tree class is involved. In other words, if the tree header file (which includes tree.cc) under /opt/aCC/include/ is used, these libraries are not thread safe. Most likely, it is indirectly referenced by including the standard C++ library container class map or set headers, or by including a RogueWave tools.h++ header like tvset.h, tpmset.h, tpmset.h, tvset.h, tvmset.h, tvmset.h, tpmap.h, tpmmap.h, tpmmap.h, tvmap.h, tvmmap.h.

Since changing the rb_tree implementation to make it thread safe would break binary compatibility, the preprocessing macro, __HPACC_THREAD_SAFE_RB_TREE, must be defined. The macro is automatically defined in the IPF environment. A new object file compiled with the macro defined should not be linked with older ones that were compiled without the macro defined. Library providers whose library is built with the the macro defined may need to notify their users to also compile their source with the macro defined when the tree header is included.


Exception Handling

It is illegal to throw out of a thread.

The following example illustrates that you cannot catch an object which has been thrown in a different thread. To do so will result in a runtime abort since HP aC++ finds no available catch handler and terminate is called.

#include <pthread.h>
void foo() {
   int i = 10;
   throw i;
}
int main() {
   pthread_t tid;
   try {
      ret=pthread_create(&tid, 0, (void*(*)(void*))foo, 0);
   }
   catch(int n) {}
}

Parallel Programming Using OpenMP

OpenMP is an industry-standard parallel programming model that implements a fork-join model of parallel execution. The HP C++ OpenMP pragmas are based on the OpenMP Standard for C/C++, version 2.0.

To view the details about the standard and details about usage, syntax and values, please go to http://www.openmp.org/specs. You can download an Adobe Acrobat (PDF) version of the C/C++ Version 2.0 OpenMP standard from this website.

+O[no]openmp Option

The +Oopenmp option is accepted at all optimization levels. The +Oopenmp option enables the recognition of OpenMP pragmas. Using the +Onoopenmp option will ignore all OpenMP directives silently.

See Pragmas section, to know more about OpenMP pragmas.

OpenMP Header File

Every C++ program that contains OpenMP pragmas is to be compiled for the current version of HP-UX and must include the header file <omp.h>. If it does not, the OpenMP pragmas are ignored. The default path for <omp.h> is /usr/include.

OpenMP Library

The OpenMP APIs are defined in the library libomp.

_OPENMP macro

The _OPENMP macro name is defined by OpenMP compliant implementation as the decimal constant 200203. This macro must not be the subject of #define or #undef preprocessing directive.

Conditional Compilation

Example:

#ifdef _OPENMP
iam = omp_get_thread_num() + index;
#endif

OpenMP Implementation

This section summarizes some of the OpenMP directives behavior that are described as "implementation dependent" in the OpenMP v2.0 API. Each behavior is cross-referenced back to its description in the OpenMP v2.0 main specification. HP, in conformance with the OpenMP v2.0 API, define and document the following behavior.

Environment Variables in OpenMP

The OpenMP environment variables recognized by HP aC++ compiler control the execution of parallel code. Note that the environment variable names are case sensitive and they must be in uppercase.

The following environment variables are available in HP aC++ compiler:

OMP_SCHEDULE

Syntax:

export OMP_SCHEDULE="kind[,chunk_size]"
setenv OMP_SCHEDULE "kind[,chunk_size]"
where, kind is one of static, dynamic, or guided.

Description:

This environment variable applies to for and parallel for directives that have the schedule type as runtime. The schedule type and chunk size for all such loops can be set at run-time by setting this environment variable to any of the recognized schedule types and to an optional chunk_size.

The default value of the environment variable is a static schedule with a chunk_size of 1. If the optional chunk_size is set, the value must be positive. If chunk_size is not set, a value of 1 is assumed, except for static schedule. For a static schedule, the default chunk_size is set to the loop iteration space divided by a number of threads applied to the loop.

NOTE: OMP_SCHEDULE is ignored for for and parallel for directives that have a schedule type other than runtime.

OMP_NUM_THREADS

Syntax:

export OMP_NUM_THREADS=value
setenv OMP_NUM_THREADS value

Description:

The OMP_NUM_THREADS environment variable sets the default number of threads to use during execution. The value of OMP_NUM_THREADS must be a positive integer. Its effect depends on whether dynamic adjustment of the number of threads is enabled, and its interaction with the omp_set_num_threads library routine and any num_threads clause on a parallel directive.

The default value is the number of physical processors on the system.

OMP_DYNAMIC

Syntax:

export OMP_DYNAMIC=value
setenv OMP_DYNAMIC value

Description:

The OMP_DYNAMIC environment variable enables or disables dynamic adjustment of the number of threads available for execution of parallel regions. The value must be either TRUE or FALSE. The default value is FALSE.

If the value is set to FALSE, dynamic adjustment is disabled. If the value is set to TRUE, the number of threads that are used for executing parallel regions may be adjusted by the runtime environment to best utilize system resources.

OMP_NESTED

Syntax:

export OMP_NESTED=value
setenv OMP_NESTED value

Description:

The OMP_NESTED environment variable enables or disables nested parallelism. Its value must be TRUE or FALSE.

If the value is set to TRUE, nested parallelism is enabled and if the value is set to FALSE, the nested parallelism is disabled. The default value is FALSE.

Runtime Library Functions in OpenMP

The OpenMP library functions are external functions. The header <omp.h> declares three types of functions:

The descriptions of library functions are divided into the following topics:

Execution Environment Functions

The execution environment functions affect and monitor threads, processors, and the parallel environment:

omp_set_num_threads

Syntax:

#include <omp.h>
void omp_set_num_threads(int num_threads);

Description:

The omp_set_num_threads function sets the number of threads to use for subsequent parallel regions. The value of the parameter num_threads must be positive. Its effect depends upon whether dynamic adjustment of the number of threads is enabled. If dynamic adjustment is disabled, the value is used as the number of threads for all subsequent parallel regions prior to the next call to this function; otherwise, the value is the maximum number of threads that will be used. This function has effect only when called from serial portions of the program. If it is called from a portion of the program where the omp_in_parallel function returns non-zero, the behavior of this function is undefined.

For more information on this subject, see the omp_set_dynamic and omp_get_dynamic functions. This call has precedence over the OMP_NUM_THREADS environment variable.

omp_get_num_threads

Syntax:

#include <omp.h>
int omp_get_num_threads(void);

Description:

The omp_get_num_threads function returns the number of threads currently in the team executing the parallel region from which it is called. The omp_set_num_threads function and the OMP_NUM_THREADS environment variable control the number of threads in a team. If the number of threads has not been explicitly set by the user, the default is implementation dependent. This function binds to the closest enclosing parallel directive. If called from a serial portion of a program, or from a nested parallel region that is serialized, this function returns 1.

omp_get_max_threads

Syntax:

#include <omp.h>
int omp_get_max_threads(void);

Description:

The omp_get_max_threads function returns an integer that is guaranteed to be at least as large as the number of threads that would be used to form a team if a parallel region without a num_threads clause were to be encountered at that point in the code.

omp_get_thread_num

Syntax:

#include <omp.h>
int omp_get_thread_num(void);

Description:

The omp_get_thread_num function returns the thread number, within its team, of the thread executing the function. The thread number lies between 0 and omp_get_num_threads()-1, inclusive. The master thread of the team is thread 0. If called from a serial region, omp_get_thread_num returns 0. If called from within a nested parallel region that is serialized, this function returns 0.

omp_get_num_procs

Syntax:

#include <omp.h>
int omp_get_num_procs(void);

Description:

The omp_get_num_procs function returns the number of processors that are available to the program at the time the function is called.

omp_in_parallel

Syntax:

#include <omp.h>
int omp_in_parallel(void);

Description:

The omp_in_parallel function returns non-zero if it is called within the dynamic extent of a parallel region executing in parallel; otherwise, it returns 0. This function returns non-zero from within a region executing in parallel, including nested regions that are serialized.

omp_set_dynamic

Syntax:

#include <omp.h>
void omp_set_dynamic(int dynamic_threads);

Description:

The omp_set_dynamic function enables or disables dynamic adjustment of the number of threads available for execution of parallel regions. This function has effect only when called from serial portions of the program. If it is called from a portion of the program where the omp_in_parallel function returns non-zero, the behavior of the function is undefined. If dynamic_threads evaluates to non-zero, the number of threads that are used for executing subsequent parallel regions may be adjusted automatically by the run-time environment to best utilize system resources. As a consequence, the number of threads specified by the user is the maximum thread count. The number of threads always remains fixed over the duration of each parallel region and is reported by the omp_get_num_threads function. If dynamic_threads evaluates to 0, dynamic adjustment is disabled. A call to omp_set_dynamic has precedence over the OMP_DYNAMIC environment variable.

The default for the dynamic adjustment of threads is 0.

omp_get_dynamic

Syntax:

#include <omp.h>
int omp_get_dynamic(void);

Description:

The omp_get_dynamic function returns non-zero if dynamic thread adjustment is enabled and returns 0 otherwise.

omp_set_nested

Syntax:

#include <omp.h>
void omp_set_nested(int nested);

Description:

The omp_set_nested function enables or disables nested parallelism. If nested evaluates to 0, which is the default, nested parallelism is disabled, and nested parallel regions are serialized and executed by the current thread. If nested evaluates to non-zero, nested parallelism is enabled, and parallel regions that are nested may deploy additional threads to form the team. This call has precedence over the OMP_NESTED environment variable.

omp_get_nested

Syntax:

#include <omp.h>
int omp_get_nested(void);

Description:

The omp_get_nested function returns non-zero if nested parallelism is enabled and 0 if it is disabled.

Lock Functions

The functions described in this section manipulate locks used for synchronization. For the following functions, the lock variable must have type omp_lock_t. This variable must only be accessed through these functions. All lock functions require an argument that has a pointer to omp_lock_t type.

For the following functions, the lock variable must have type omp_nest_lock_t. This variable must only be accessed through these functions. All nestable lock functions require an argument that has a pointer to omp_nest_lock_t type.

omp_init_lock & omp_init_nest_lock Functions

Format:

#include <omp.h>
void omp_init_lock(omp_lock_t *lock);
void omp_init_nest_lock(omp_nest_lock_t *lock);

Description:

These functions provide the only means of initializing a lock. Each function initializes the lock associated with the parameter lock for use in subsequent calls. The initial state is unlocked (that is, no thread owns the lock). For a nestable lock, the initial nesting count is zero.

omp_destroy_lock & omp_destroy_nest_lock Functions

Format:

#include <omp.h>
void omp_destroy_lock(omp_lock_t *lock);
void omp_destroy_nest_lock(omp_nest_lock_t *lock);

Description:

These functions ensure that the pointed to lock variable lock is uninitialized. The argument to these functions must point to an initialized lock variable that is locked.

omp_set_lock & omp_set_nest_lock Functions

Format:

#include <omp.h>
void omp_set_lock(omp_lock_t *lock);
void omp_set_nest_lock(omp_nest_lock_t *lock);

Description:

Each of these functions blocks the thread executing the function until the specified lock is available and then sets the lock. A simple lock is available if it is unlocked. A nestable lock is available if it is unlocked or if it is already owned by the thread executing the function.
For a simple lock, the argument to the omp_set_lock function must point to an initialized lock variable. Ownership of the lock is granted to the thread executing the function.
For a nestable lock, the argument to the omp_set_nest_lock function must point to an initialized lock variable. The nesting count is incremented, and the thread is granted, or retains, ownership of the lock.

omp_set_lock & omp_unset_nest_lock Functions

Format:

#include <omp.h>
void omp_unset_lock(omp_lock_t *lock);
void omp_unset_nest_lock(omp_nest_lock_t *lock);

Description:

These functions provide the means of releasing ownership of a lock. The argument to each of these functions must point to an initialized lock variable owned by the thread executing the function. The behavior is undefined if the thread does not own that lock.
For a simple lock, the omp_unset_lock function releases the thread executing the function from ownership of the lock.
For a nestable lock, the omp_unset_nest_lock function decrements the nesting count, and releases the thread executing the function from ownership of the lock if the resulting count is zero.

omp_test_lock & omp_test_nest_lock Functions

Format:

#include <omp.h>
int omp_test_lock(omp_lock_t *lock);
int omp_test_nest_lock(omp_nest_lock_t *lock);

Description:

These functions attempt to set a lock but do not block execution of the thread. The argument must point to an initialized lock variable. These functions attempt to set a lock in the same manner as omp_set_lock and omp_set_nest_lock, except that they do not block execution of the thread.
For a simple lock, the omp_test_lock function returns non-zero if the lock is successfully set; otherwise, it returns zero.
For a nestable lock, the omp_test_nest_lock function returns the new nesting count if the lock is successfully set; otherwise, it returns zero.

Timing Routines

The functions described in this section support a portable wall-clock timer:

omp_get_wtime Function

Format:

#include <omp.h>
double omp_get_wtime(void);

Description:

The omp_get_wtime function returns a double-precision floating point value equal to the elapsed wall clock time in seconds since some "time in the past". The actual "time in the past" is arbitrary, but it is guaranteed not to change during the execution of the application program.

The function may be used to measure elapsed times as shown in the following example:

double start;
double end;
start = omp_get_wtime();
... work to be timed ...
end = omp_get_wtime();
printf("Work took %f sec. time.\n", end-start);

The time returned are "per-thread times". They are not required to be globally consistent across all the threads participating in an application.

omp_get_wtick Function

Format:

#include <omp.h>
double omp_get_wtick(void);

Description:

The omp_get_wtick function returns a double-precision floating point value equal to the number of seconds between successive clock ticks.