If you attempt to spawn thread-parallelism from within a thread-parallel
construct and the two constructs are in the same routine, the compiler
will ignore your directives on the inner thread-parallel construct.
Consequently, the inner parallel construct will simply run serially.
Calling a thread-parallel routine from another thread-parallel routine
is considered an error but is not caught at compile-time.
Thread ID assignments |
 |
Chapter 3, "Chapter 3 “Compiler optimizations”,"
discusses how programs are initiated as a collection of threads,
one per available processor, and how all but thread 0 are idle until
parallelism is encountered. We will now discuss the details of how
threads are spawned and assigned IDs.
When a process begins, the threads created to run it have
unique
kernel thread IDs. Thread 0, which runs all the serial code
in the program, has kernel thread ID 0; the rest of the threads
have unique but unspecified kernel thread IDs at this point. The
num_threads() intrinsic will return
the number of threads created, regardless of how many are active
when it is called.
When thread 0 encounters parallelism, it spawns
some or all of the threads created at program start. This means
it causes these threads to go from idle to active, at which point
they begin working on their share of the parallel code. All available
threads are spawned by default, but this can be changed using various
compiler directives.
If the parallel structure is thread-parallel, then num_threads()
threads will be spawned, subject to user-specified limits. At this
point, kernel thread 0 becomes spawn thread 0, and the spawned threads are assigned spawn
thread IDs ranging from 0..num_threads()-1
(this range begins at what used to be kernel thread 0). If you manually
limit the number of spawned threads, these IDs will range from 0
to one less than your limit. If you attempt to spawn thread-parallelism
within an already thread-parallel structure, the thread attempting
to spawn will acquire spawn thread ID 0. If all threads attempt
to spawn thread-parallelism in this manner, they will all become
spawn thread 0, each in a unique context.
If the parallel structure is node-parallel, then
num_nodes() threads will be spawned,
one per available hypernode, subject to user-specified limits. Again,
kernel thread 0 becomes spawn thread 0, and in this case, the spawn
thread IDs range from 0..num_nodes()-1,
subject to user limits as described above.
If thread-parallelism is then encountered within this node-parallelism,
num_node_threads() threads will
be spawned on the hypernode or hypernodes encountering the thread-parallelism.
These spawned threads will have spawn thread IDs, which are specific
to the hypernode they are running on, ranging from 0..num_node_threads()-1,
with spawn thread ID 0 belonging to the initial thread that executes
the spawn. num_node_threads() may
return a different value on each hypernode when called from node-parallel
code.
Note that, with nested parallelism, a node-parallel thread
that encounters a thread-parallel construct becomes spawn thread
0 on that hypernode regardless of its previous spawn thread ID.
When this thread exits the thread-parallel construct, it returns
to its previous spawn thread ID. The
my_thread() intrinsic function
returns the caller's spawn thread ID, which depends on
the level of parallelism.