Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home

Parallel Programming Guide for HP-UX Systems: K-Class and V-Class Servers

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

HP Part Number: B3909-90003

Edition: Second Edition

Published: March 2000

Revision History
Revision SecondB3909-90003
March 2000. Added OpenMP appendix.
Revision FirstB6056-96006
June 1998.Initial Release.

Table of Contents

Preface
Scope
Notational conventions
Command syntax
Associated documents
1 Introduction
HP SMP architectures
Bus-based systems
Hyperplane Interconnect systems
Parallel programming model
The shared-memory paradigm
The message-passing paradigm
Overview of HP optimizations
Basic scalar optimizations
Advanced scalar optimizations
Parallelization
2 Architecture overview
System architectures
Data caches
Cache thrashing
Memory Systems
Physical memory
Virtual memory
Interleaving
Variable-sized pages on HP-UX
3 Optimization levels
HP optimization levels and features
Cumulative Options
Using the Optimizer
General guidelines
4 Standard optimization features
Machine instruction level optimizations (+O0)
Constant folding
Partial evaluation of test conditions
Simple register assignment
Data alignment on natural boundaries
Block level optimizations (+O1)
Branch optimization
Dead code elimination
Faster register allocation
Instruction scheduling
Peephole optimizations
Routine level optimizations (+O2)
Advanced constant folding and propagation
Common subexpression elimination
Global register allocation (GRA)
Loop-invariant code motion
Loop unrolling
Register reassociation
Software pipelining
Strength reduction of induction variables
and constants
Store and copy optimization
Unused definition elimination
5 Loop and cross-module optimization features
Strip mining
Inlining within a single source file
Cloning within a single source file
Data localization
Conditions that inhibit data localization
Loop blocking
Data reuse
Loop distribution
Loop fusion
Loop interchange
Loop unroll and jam
Preventing loop reordering
Test promotion
Cross-module cloning
Global and static variable optimizations
Inlining across multiple source files
6 Parallel optimization features
Levels of parallelism
Loop-level parallelism
Threads
Loop transformations
Idle thread states
Determining idle thread states
Parallel optimizations
Dynamic selection
Inhibiting parallelization
Loop-carried dependences (LCDs)
Reductions
Preventing parallelization
Parallelism in the aC++ compiler
Cloning across multiple source files
7 Controlling optimization
Command-line optimization options
Invoking command-line options
+O[no]aggressive
+ O[no]all
+O[no]autopar
+O[no]conservative
+O[no]dataprefetch
+O[no]dynsel
+O[no]entrysched
+O[no]fail_safe
+O[no]fastaccess
+O[no]fltacc
+O[no]global_ptrs_unique[=namelist]
+O[no]info
+O[no]initcheck
+O[no]inline[=namelist]
+Oinline_budget=n
+O[no]libcalls
+O[no]limit
+O[no]loop_block
+O[no]loop_transform
+O[no]loop_unroll[=unroll factor]
+O[no]loop_unroll_jam
+O[no]moveflops
+O[no]multiprocessor
+O[no]parallel
+O[no]parmsoverlap
+O[no]pipeline
+O[no]procelim
+O[no]ptrs_ansi
+O[no]ptrs_strongly_typed
+O[no]ptrs_to_globals[=namelist]
+O[no]regreassoc
+O[no]report[=report_type]
+O[no]sharedgra
+O[no]signedpointers
+O[no]size
+O[no]static_prediction
+O[no]vectorize
+O[no]volatile
+O[no]whole_program_mode
+tm target
C aliasing options
Optimization directives and pragmas
Rules for usage
block_loop[(block_factor=n)]
dynsel[(trip_count=n)]
no_block_loop
no_distribute
no_dynsel
no_loop_dependence(namelist)
no_loop_transform
no_parallel
no_side_effects(funclist)
unroll_and_jam[(unroll_factor=n)]
8 Optimization Report
Optimization Report contents
Loop Report
Supplemental tables
9 Parallel programming techniques
Parallelizing directives and pragmas
Parallelizing loops
prefer_parallel
loop_parallel
prefer_parallel, loop_parallel attributes
Combining the attributes
Comparing prefer_parallel, loop_parallel
Stride-based parallelism
critical_section, end_critical_section
Disabling automatic loop thread-parallelization
Parallelizing tasks
Parallelizing regions
Reentrant compilation
Setting thread default stack size
Modifying thread stack size
Collecting parallel information
Number of processors
Number of threads
Thread ID
Stack memory type
10 Data privatization
Directives and pragmas for data privatization
Privatizing loop variables
loop_private
save_last[(list)]
Privatizing task variables
task_private
Privatizing region variables
parallel_private
11 Memory classes
Porting multinode applications to single-node servers
Private versus shared memory
thread_private
node_private
Memory class assignments
C and C++ data objects
Static assignments
12 Parallel synchronization
Thread-parallelism
Thread ID assignments
Synchronization tools
Using gates and barriers
Synchronization functions
sync_routine
loop_parallel(ordered)
Critical sections
Ordered sections
Synchronizing code
Using critical sections
Using ordered sections
Manual synchronization
13 Troubleshooting
Aliasing
ANSI algorithm
Type-safe algorithm
Specifying aliasing modes
Iteration and stop values
Global variables
False cache line sharing
Aligning data to avoid false sharing
Distributing iterations on c ache line boundaries
Thread-specific array elements
Scalars sharing a cache line
Working with unaligned arrays
Working with dependences
Floating-point imprecision
Enabling sudden underflow
Invalid subscripts
Misused directives and pragmas
Loop-carried dependences
Reductions
Nondeterminism of parallel execution
Triangular loops
Parallelizing the outer loop
P arallelizing the inner loop
Examining the code
Compiler assumptions
Incrementing by zero
Trip counts that may overflow
A Porting CPSlib functions to
pthreads
Introduction
Accessing pthreads
Mapping CPSlib functions to pthreads
Environment variables
Using pthreads
Symmetric parallelism
Asymmetric parallelism
Synchronization using high-level functions
Synchronization using low-level functions
B OpenMP Parallel Programming Model
What is OpenMP?
HP's Implementation of OpenMP
OpenMP Command-line Options
OpenMP Directives
OpenMP Data Scope Clauses
Other Supported OpenMP Clauses
From HP Programming Model to OpenMP
Syntax
HP Programming Model Directives
More Information on OpenMP
Glossary

List of Tables

3-1 Locations of HP compilers
3-2 Optimization levels and features
5-1 Loop transformations affecting data localization
5-2 Form of no_loop_dependence directive and pragma
5-3 Computation sequence of A(I,J): original loop
5-4 Computation sequence of A(I,J): interchanged loop
5-5 Forms of block_loop, no_block_loop directives and pragmas
5-6 Form of no_distribute directive and pragma
5-7 Forms of unroll_and_jam, no_unroll_and_jam directives and pragmas
5-8 Form of no_loop_transform directive and pragma
6-1 Form of MP_IDLE_THREADS_WAIT environment variable
6-2 Form of dynsel directive and pragma
6-3 Form of reduction directive and pragma
6-4 Form of no_parallel directive and pragma
7-1 Command-line optimization options
7-2 +O[no]fltacc and floating-point optimizations
7-3 Optimization Report contents
7-4 +tm target and +DA/+DS
7-5 Directive-based optimization options
7-6 Form of optimization directives and pragmas
8-1 Optimization Report contents
8-2 Loop Report column definitions
8-3 Reordering transformation values in the Loop Report
8-4 Optimizing/special transformations values in the Loop Report
8-5 Analysis Table column definitions
8-6 Privatization Table column definitions
8-7 Variable Name Footnote Table column definitions
9-1 Parallel directives and pragmas
9-2 Forms of prefer_parallel and loop_parallel directives and pragmas
9-3 Attributes for loop_parallel, prefer_parallel
9-4 Comparison of loop_parallel and prefer_parallel
9-5 Iteration distribution using chunk_size = 1
9-6 Iteration distribution using chunk_size = 5
9-7 Forms of critical_section/end_critical_section directives and pragmas
9-8 Forms of task parallelization directives and pragmas
9-9 Attributes for task parallelization
9-10 Forms of region parallelization directives and pragmas
9-11 Attributes for region parallelization
9-12 Forms of CPS_STACK_SIZE environment variable
9-13 Number of processors functions
9-14 Number of threads functions
9-15 Thread ID functions
9-16 Stack memory type functions
10-1 Data Privatization Directives and Pragmas
10-2 Form of loop_private directive and pragma
10-3 Form of save_last directive and pragma
10-4 Form of task_private directive and pragma
10-5 Form of parallel_private directive and pragma
11-1 Form of memory class directives and variable declarations
12-1 Forms of gate and barriers variable declarations
12-2 Forms of allocation functions
12-3 Forms of deallocation functions
12-4 Forms of locking functions
12-5 Form of unlocking functions
12-6 Form of wait functions
12-7 Form of sync_routine directive and pragma
12-8 Forms of critical_section, end_critical_section directives and pragmas
12-9 Forms of ordered_section, end_ordered_section directives and pragmas
13-1 Initial mapping of array to cache lines
13-2 Default distribution of the I loop
A-1 CPSlib library functions to pthreads mapping
A-2 CPSlib environment variables
B-1 OpenMP Directives and Required Opt Levels
B-2 OpenMP and HPPM Directives/Clauses
Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.