Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Parallel Programming Guide for HP-UX Systems: K-Class and V-Class Servers

Glossary

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

A

absolute address 

An address that does not undergo virtual-to-physical address translation when used to reference memory or the
I/O register area.


accumulator 

A variable used to accumulate value. Accumulators are typically assigned a function of themselves, which can create dependences when done in loops.


actual argument 

In Fortran, a value that is passed by a call to a procedure (function or subroutine). The actual argument appears in the source of the calling procedure; the argument that appears in the source of the called procedure is a dummy argument. C and C++ conventions refer to actual arguments as actual parameters.


actual parameter 

In C and C++, a value that is passed by a call to a procedure (function). The actual parameter appears in the source of the calling procedure; the parameter that appears in the source of the called procedure is a formal parameter. Fortran conventions refer to actual parameters as actual arguments.


address 

A number used by the operating system to identify a storage location.


address space 

Memory space, either physical or virtual, available to a process.


alias 

An alternative name for some object, especially an alternative variable name that refers to a memory location. Aliases can cause data dependences, which prevent the compiler from parallelizing parts of a program.


alignment 

A condition in which the address, in memory, of a given data item is integrally divisible by a particular integer value, often the size of the data item itself. Alignment simplifies the addressing of such data items.


allocatable array 

In Fortran 90, a named array whose rank is specified at compile time, but whose bounds are determined at run time.


allocate 

An action performed by a program at runtime in which memory is reserved to hold data of a given type. In Fortran 90, this is done through the creation of allocatable arrays. In C, it is done through the dynamic creation of memory blocks using malloc. In C++, it is done through the dynamic creation of memory blocks using malloc or new.


ALU 

Arithmetic logic unit. A basic element of the central processing unit (CPU) where arithmetic and logical operations are performed.


Amdahl's law 

A statement that the ultimate performance of a computer system is limited by the slowest component. In the context of HP servers this is interpreted to mean that the serial component of the application code will restrict the maximum speed-up that is achievable.


American National Standards Institute (ANSI) 

A repository and coordinating agency for standards implemented in the U.S. Its activities include the production of Federal Information Processing (FIPS) standards for the Department of Defense (DoD).


ANSI 

See American National Standards Institute.


apparent recurrence 

A condition or construct that fails to provide the compiler with sufficient information to determine whether or not a recurrence exists. Also called a potential recurrence.


argument 

In Fortran, either a variable declared in the argument list of a procedure (function or subroutine) that receives a value when the procedure is called (dummy argument) or the variable or constant that is passed by a call to a procedure (actual argument). C and C++ conventions refer to arguments as parameters.


arithmetic logic unit (ALU) 

A basic element of the central processing unit (CPU) where arithmetic and logical operations are performed.


array 

An ordered structure of operands of the same data type. The structure of an array is defined by its rank, shape, and data type.


array section 

A Fortran 90 construct that defines a subset of an array by providing starting and ending elements and strides for each dimension. For an array A(4,4), A(2:4:2,2:4:2) is an array section containing only the evenly indexed elements A(2,2), A(4,2), A(2,4), and A(4,4).


array-valued argument 

In Fortran 90, an array section that is an actual argument to a subprogram.


ASCII 

American Standard Code for Information Interchange. This encodes printable and non-printable characters into a range of integers.


assembler 

A program that converts assembly language programs into executable machine code.


assembly language 

A programming language whose executable statements can each be translated directly into a corresponding machine instruction of a particular computer system.


automatic array 

In Fortran, an array of explicit rank that is not a dummy argument and is declared in a subprogram.


B

bandwidth 

A measure of the rate at which data can be moved through a device or circuit. Bandwidth is usually measured in millions of bytes per second (Mbytes/sec) or millions of bits per second (Mbits/sec).


bank conflict 

An attempt to access a particular memory bank before a previous access to the bank is complete, or when the bank is not yet finished recycling (i.e., refreshing).


barrier 

A structure used by the compiler in barrier synchronization. Also sometimes used to refer to the construct used to implement barrier synchronization. See also barrier synchronization.


barrier synchronization 

A control mechanism used in parallel programming that ensures all threads have completed an operation before continuing execution past the barrier in sequential mode. On HP servers, barrier synchronization can be automated by certain CPSlib routines and compiler directives. See also barrier.


basic block 

A linear sequence of machine instructions with a single entry and a single exit.


bit 

A binary digit.


blocking factor 

Integer representing the stride of the outer strip of a pair of loops created by blocking.


branch 

A class of instructions which change the value of the program counter to a value other than that of the next sequential instruction.


byte  

A group of contiguous bits starting on an addressable boundary. A byte is 8 bits in length.


C

cache 

A small, high-speed buffer memory used in modern computer systems to hold temporarily those portions of the contents of the memory that are, or are believed to be, currently in use. Cache memory is physically separate from main memory and can be accessed with substantially less latency. HP servers employ separate data and instruction cache memories.


cache hit 

A cache hit occurs if data to be loaded is residing in the cache.


cache line 

A chunk of contiguous data that is copied into a cache in one operation. On V2250 servers, processor cache lines are 32 bytes


cache memory 

A small, high-speed buffer memory used in modern computer systems to hold temporarily those portions of the contents of the memory that are, or are believed to be, currently in use. Cache memory is physically separate from main memory and can be accessed with substantially less latency. V2250 servers employ separate data and instruction caches.


cache miss 

A cache miss occurs if data to be loaded is not residing in the cache.


cache purge 

The act of invalidating or removing entries in a cache memory.


cache thrashing 

Cache thrashing occurs when two or more data items that are frequently needed by the program map to the same cache address. In this case, each time one of the items is encached it overwrites another needed item, causing constant cache misses and impairing data reuse. Cache thrashing also occurs when two or more threads are simultaneously writing to the same cache line.


cache, direct mapped 

A form of cache memory that addresses encached data by a function of the data's virtual address. On V2250 servers, the processor cache address is identical to the least-significant 21 bits of the data's virtual address. This means cache thrashing can occur when the virtual addresses of two data items are an exact multiple of 2 Mbyte (21 bits) apart.


central processing unit (CPU) 

The central processing unit (CPU) is that portion of a computer that recognizes and executes the instruction set.


clock cycle 

The duration of the square wave pulse sent throughout a computer system to synchronize operations.


clone 

A compiler-generated copy of a loop or procedure. When the HP compilers generate code for a parallelizable loop, they generate two versions: a serial clone and a parallel clone. See also dynamic selection.


code 

A computer program, either in source form or in the form of an executable image on a machine.


coherency 

A term frequently applied to caches. If a data item is referenced by a particular processor on a multiprocessor system, the data is copied into that processor's cache and is updated there if the processor modifies the data. If another processor references the data while a copy is still in the first processor's cache, a mechanism is needed to ensure that the second processor does not use an outdated copy of the data from memory. The state that is achieved when both processors' caches always have the latest value for the data is called cache coherency. On multiprocessor servers an item of data may reside concurrently in several processors' caches.


column-major order 

Memory representation of an array such that the columns are stored contiguously. For example, given a two-dimensional array A(3,4), the array element A(3,1) immediately precedes element A(1,2) in memory. This is the default storage method for arrays in Fortran.


compiler 

A computer program that translates computer code written in a high-level programming language, such as Fortran, into equivalent machine language.


concurrent  

In parallel processing, threads that can execute at the same time are called concurrent threads.


conditional induction variable 

A loop induction variable that is not necessarily incremented on every iteration.


constant folding 

Replacement of an operation on constant operands with the result of the operation.


constant propagation 

The automatic compile-time replacement of variable references with a constant value previously assigned to that variable. Constant propagation is performed within a single procedure by conventional compilers.


conventional compiler 

A compiler that cannot perform interprocedural optimization.


counter 

A variable that is used to count the number of times an operation occurs.


CPA 

CPU Agent. The gate array on V2250 servers that provides a high-speed interface between pairs of PA-RISC processors and the crossbar. Also called the CPU Agent and the agent.


CPU 

Central processing unit. The central processing unit (CPU) is that portion of a computer that recognizes and executes the instruction set.


CPU Agent 

The gate array on V2250 servers that provides a high-speed interface between pairs of PA-RISC processors and the crossbar.


CPU time 

The amount of time the CPU requires to execute a program. Because programs share access to a CPU, the wall-clock time of a program may not be the same as its CPU time. If a program can use multiple processors, the CPU time may be greater than the wall-clock time. (See wall-clock time.)


CPU-private memory 

Data that is accessible by a single thread only (not shared among the threads constituting a process). A thread-private data object has a unique virtual address which maps to a unique physical address. Threads access the physical copies of thread-private data residing on their own hypernode when they access thread-private virtual addresses.


critical section 

A portion of a parallel program that can be executed by only one thread at a time.


crossbar 

A switching device that connects the CPUs, banks of memory, and I/O controller on a single hypernode of a V2250 server. Because the crossbar is nonblocking, all ports can run at full bandwidth simultaneously, provided there is not contention for a particular port.


CSR 

Control/Status Register. A CSR is a software-addressable hardware register used to hold control information or state.


D

data cache (Dcache) 

A small cache memory with a fast access time. This cache holds prefetched and current data. On V2250 servers, processors have 2-Mbyte off-chip caches. See also cache, direct mapped.


data dependence 

A relationship between two statements in a program, such that one statement must precede the other to produce the intended result. (See also loop-carried dependence (LCD) and loop-independent dependence (LID).)


data localization 

Optimizations designed to keep frequently used data in the processor data cache, thus eliminating the need for more costly memory accesses.


data type 

A property of a data item that determines how its bits are grouped and interpreted. For processor instructions, the data type identifies the size of the operand and the significance of the bits in the operand. Some example data types include INTEGER, int, REAL, and float.


Dcache 

Data cache. A small cache memory with a one clock cycle access time under pipelined conditions. This cache holds prefetched and current data.On V2250 servers, this cache is 2 Mbytes.


deadlock 

A condition in which a thread waits indefinitely for some condition or action that cannot, or will not, occur.


direct memory access (DMA) 

A method for gaining direct access to memory and achieving data transfers without involving the CPU.


distributed memory 

A memory architecture used in multi-CPU systems, in which the system's memory is physically divided among the processors. In most distributed-memory architectures, memory is accessible from the single processor that owns it. Sharing of data requires explicit message passing.


distributed part 

A loop generated by the compiler in the process of loop distribution.


DMA 

Direct memory access. A method for gaining direct access to memory and achieving data transfers without involving the CPU.


double  

A double-precision floating-point number that is stored in 64 bits in C and C++.


doubleword 

A primitive data operand which is 8 bytes (64 bits) in length. Also called a longword. See also word.


dummy argument 

In Fortran, a variable declared in the argument list of a procedure (function or subroutine) that receives a value when the procedure is called. The dummy argument appears in the source of the called procedure; the parameter that appears in the source of the calling procedure is an actual argument. C and C++ conventions refer to dummy arguments as formal parameters.


dynamic selection 

The process by which the compiler chooses the appropriate runtime clone of a loop. See also clone.


E

encache 

To copy data or instructions into a cache.


exception 

A hardware-detected event that interrupts the running of a program, process, or system. See also fault.


execution stream 

A series of instructions executed by a CPU.


F

fault 

A type of interruption caused by an instruction requesting a legitimate action that cannot be carried out immediately due to a system problem.


floating-point 

A numerical representation of a real number. On V2250 servers, a floating point operand has a sign (positive or negative) part, an exponent part, and a fraction part. The fraction is a fractional representation. The exponent is the value used to produce a power of two scale factor (or portion) that is subsequently used to multiply the fraction to produce an unsigned value.


FLOPS 

Floating-point operations per second. A standard measure of computer processing power in the scientific community.


formal parameter 

In C and C++, a variable declared in the parameter list of a procedure (function) that receives a value when the procedure is called. The formal parameter appears in the source of the called procedure; the parameter that appears in the source of the calling procedure is an actual parameter. Fortran conventions refer to formal parameters as dummy arguments.


Fortran 

A high-level software language used mainly for scientific applications.


Fortran 90 

The international standard for Fortran adopted in 1991.


function 

A procedure whose call can be imbedded within another statement, such as an assignment or test. Any procedure in C or C++ or a procedure defined as a FUNCTION in Fortran.


functional unit (FU) 

A part of a CPU that performs a set of operations on quantities stored in registers.


G

gate 

A construct that restricts execution of a block of code to a single thread. A thread locks a gate on entering the gated block of code and unlocks the gate on exiting the block. When the gate is locked, no other threads can enter. Compiler directives can be used to automate gate constructs; gates can also be implemented using semaphores.


Gbyte  

See gigabyte.


gigabyte 

1073741824 (230) bytes.


global optimization 

A restructuring of program statements that is not confined to a single basic block. Global optimization, unlike interprocedural optimization, is confined to a single procedure. Global optimization is done by HP compilers at optimization level +O2 and above.


global register allocation (GRA) 

A method by which the compiler attempts to store commonly-referenced scalar variables in registers throughout the code in which they are most frequently accessed.


global variable 

A variable whose scope is greater than a single procedure. In C and C++ programs, a global variable is a variable that is defined outside of any one procedure. Fortran has no global variables per se, but COMMON blocks can be used to make certain memory locations globally accessible.


granularity 

In the context of parallelism, a measure of the relative size of the computation done by a thread or parallel construct. Performance is generally an increasing function of the granularity. In higher-level language programs, possible sizes are routine, loop, block, statement, and expression. Fine granularity can be exhibited by parallel loops, tasks and expressions, Coarse granularity can be exhibited by parallel processes.


H

hand-rolled loop 

A loop, more common in Fortran than C or C++, that is constructed using IF tests and GOTO statements rather than a language-provided loop structure such as DO.


hidden alias 

An alias that, because of the structure of a program or the standards of the language, goes undetected by the compiler. Hidden aliases can result in undetected data dependences, which may result in wrong answers.


High Performance Fortran (HPF) 

An ad-hoc language extension of Fortran 90 that provides user-directed data distribution and alignment. HPF is not a standard, but rather a set of features desirable for parallel programming.


hoist 

An optimization process that moves a memory load operation from within a loop to the basic block preceding the loop.


HP 

Hewlett-Packard, the manufacturer of the PA-RISC chips used as processors in V2250 servers.


HP-UX 

Hewlett-Packard's Unix-based operating system for its
PA-RISC workstations and servers.


hypercube 

A topology used in some massively parallel processing systems. Each processor is connected to its binary neighbors. The number of processors in the system is always a power of two; that power is referred to as the dimension of the hypercube. For example, a 10-dimensional hypercube has 210, or 1,024 processors.


hypernode 

A set of processors and physical memory organized as a symmetric multiprocessor (SMP) running a single image of the operating system. Nonscalable servers and V2250 servers consist of one hypernode. When discussing multidimensional parallelism or memory classes, hypernodes are generally called nodes.


I

Icache 

Instruction cache. This cache holds prefetched instructions and permits the simultaneous decoding of one instruction with the execution of a previous instruction. On V2250 servers, this cache is 2 Mbytes.


IEEE 

Institute for Electrical and Electronic Engineers. An international professional organization and a member of ANSI and ISO.


induction variable 

A variable that changes linearly within the loop, that is, whose value is incremented by a constant amount on every iteration. For example, in the following Fortran loop, I, J and K are induction variables, but L is not.

DO I = 1, N J = J + 2 K = K + N L = L + I ENDDO


inlining 

The replacement of a procedure (function or subroutine) call, within the source of a calling procedure, by a copy of the called procedure's code.


Institute for Electrical and Electronic Engineers (IEEE) 

An international professional organization and a member of ANSI and ISO.


instruction 

One of the basic operations performed by a CPU.


instruction cache (Icache) 

This cache holds prefetched instructions and permits the simultaneous decoding of one instruction with the execution of a previous instruction. On V2250 servers, this cache is 2 Mbytes.


instruction mnemonic 

A symbolic name for a machine instruction.


integral division 

Division that results in a whole number solution with no remainder. For example, 10 is integrally divisible by 2, but not by 3.


interface  

A logical path between any two modules or systems.


interleaved memory 

Memory that is divided into multiple banks to permit concurrent memory accesses. The number of separate memory banks is referred to as the memory stride.


interprocedural optimization 

Automatic analysis of relationships and interfaces between all subroutines and data structures within a program. Traditional compilers analyze only the relationships within the procedure being compiled.


interprocessor communication 

The process of moving or sharing data, and synchronizing operations between processors on a multiprocessor system.


intrinsic 

A function or subroutine that is an inherent part of a computer language. For example, SIN is a Fortran intrinsic.


J

job scheduler 

That portion of the operating system that schedules and manages the execution of all processes.


join 

The synchronized termination of parallel execution by spawned tasks or threads.


jump 

Departure from normal one-step incrementing of the program counter.


K

kbyte 

See kilobyte.


kernel 

The core of the operating system where basic system facilities, such as file access and memory management functions, are performed.


kernel thread identifier (ktid) 

A unique integer identifier (not necessarily sequential) assigned when a thread is created.


kilobyte 

1024 (210) bytes.


L

latency 

The time delay between the issuing of an instruction and the completion of the operation. A common benchmark used for comparing systems is the latency of coherent memory access instructions. This particular latency measurement is believed to be a good indication of the scalability of a system; low latency equates to low system overhead as system size increases.


linker 

A software tool that combines separate object code modules into a single object code module or executable program.


load 

An instruction used to move the contents of a memory location into a register.


local optimization 

Restructuring of program statements within the scope of a basic block. Local optimization is done by HP compilers at optimization level +O1 and above.


locality of reference 

An attribute of a memory reference pattern that refers to the likelihood of an address of a memory reference being physically close to the CPU making the reference.


localization 

Data localization. Optimizations designed to keep frequently used data in the processor data cache, thus eliminating the need for more costly memory accesses.


logical address 

Logical address space is that address as seen by the application program.


logical memory 

Virtual memory. The memory space as seen by the program, which may be larger than the available physical memory. The virtual memory of a V2250 server can be up to 16 Tbytes. HP-UX can map this virtual memory to a smaller set of physical memory, using disk space to make up the difference if necessary. Also called virtual memory.


longword (l) 

Doubleword. A primitive data operand which is 8 bytes (64 bits) in length. See also word.


loop blocking 

A loop transformation that strip mines and interchanges a loop to provide optimal reuse of the encachable loop data.


loop constant 

A constant or expression whose value does not change within a loop.


loop distribution 

The restructuring of a loop nest to create simple loop nests. Loop distribution creates two or more loops, called distributed parts, which can serve to make parallelization more efficient by increasing the opportunities for loop interchange and isolating code that must run serially from parallelizable code. It can also improve data localization and other optimizations.


loop induction variable 

See induction variable.


loop interchange 

The reordering of nested loops. Loop interchange is generally done to increase the granularity of the parallelizable loop(s) present or to allow more efficient access to loop data.


loop invariant 

Loop constant. A constant or expression whose value does not change within a loop.


loop invariant computation 

An operation that yields the same result on every iteration of a loop.


loop replication 

The process of transforming one loop into more than one loop to facilitate an optimization. The optimizations that replicate loops are IF-DO and if-for optimizations, dynamic selection, loop unrolling, and loop blocking.


loop-carried dependence (LCD) 

A dependence between two operations executed on different iterations of a given loop and on the same iteration of all enclosing loops. A loop carries a dependence from an indexed assignment to an indexed use if, for some iteration of the loop, the assignment stores into an address that is referred to on a different iteration of the loop.


loop-independent dependence (LID) 

A dependence between two operations executed on the same iteration of all enclosing loops such that one operation must precede the other to produce correct results.


M

machine exception 

A fatal error in the system that cannot be handled by the operating system. See also exception.


main memory 

Physical memory other than what the processor caches.


main procedure 

A procedure invoked by the operating system when an application program starts up. The main procedure is the main program in Fortran; in C and C++, it is the function main().


main program 

In a Fortran program, the program section invoked by the operating system when the program starts up.


Mbyte 

See megabyte (Mbyte).


megabyte (Mbyte) 

1048576 (220) bytes.


megaflops (MFLOPS) 

One million floating-point operations per second.


memory bank conflict 

An attempt to access a particular memory bank before a previous access to the bank is complete, or when the bank is not yet finished recycling (i.e., refreshing).


memory management 

The hardware and software that control memory page mapping and memory protection.


message 

Data copied from one process to another (or the same) process. The copy is initiated by the sending process, which specifies the receiving process. The sending and receiving processes need not share a common address space. (Note: depending on the context, a process may be a thread.)


message passing 

A type of programming in which program modules (often running on different processors or different hosts) communicate with each other by means of system library calls that package, transmit, and receive data. All message-passing library calls must be explicitly coded by the programmer.


Message-Passing Interface (MPI) 

A message-passing and process control library. For information on the Hewlett-Packard implementation of MPI, refer to the HP MPI User's Guide (B6011-90001).


MIMD (multiple instruction stream multiple data stream) 

A computer architecture that uses multiple processors, each processing its own set of instructions simultaneously and independently of others. MIMD also describes when processes are performing different operations on different data. Compare with SIMD.


multiprocessing 

The creation and scheduling of processes on any subset of CPUs in a system configuration.


mutex 

A variable used to construct an area (region of code) of mutual exclusion. When a mutex is locked, entry to the area is prohibited; when the mutex is free, entry is allowed.


mutual exclusion 

A protocol that prevents access to a given resource by more than one thread at a time.


N

negate 

An instruction that changes the sign of a number.


network 

A system of interconnected computers that enables machines and their users to exchange information and share resources.


node 

On HP scalable and nonscalable servers, a node is equivalent to a hypernode. The term "node" is generally used in place of hypernode.


non-uniform memory access (NUMA) 

This term describes memory access times in systems in which accessing different types of memory (for example, memory local to the current hypernode or memory remote to the current hypernode) results in non-uniform access times.


nonblocking crossbar 

A switching device that connects the CPUs, banks of memory, and I/O controller on a single hypernode. Because the crossbar is nonblocking, all ports can run at full bandwidth simultaneously provided there is not contention for a particular port.


NUMA 

Non-uniform memory access. This term describes memory access times in systems in which accessing different types of memory (for example, memory local to the current hypernode or memory remote to the current hypernode) results in non-uniform access times.


O

offset 

In the context of a process address space, an integer value that is added to a base address to calculate a memory address. Offsets in V2250 servers are 64-bit values, and must keep address values within a single 16-Tbyte memoryspace.


opcode 

A predefined sequence of bits in an instruction that specifies the operation to be performed.


operating system 

The program that manages the resources of a computer system. V2250 servers use the HP-UX operating system.


optimization 

The refining of application software programs to minimize processing time. Optimization takes maximum advantage of a computer's hardware features and minimizes idle processor time.


optimization level 

The degree to which source code is optimized by the compiler. The HP compilers offer five levels of optimization: level +O0, +O1, +O2, +O3, and +O4. The +O4 option is not available in Fortran 90.


oversubscript 

An array reference that falls outside declared bounds.


oversubscription 

In the context of parallel threads, a process attribute that permits the creation of more threads within a process than the number of processors available to the process.


P

PA-RISC 

The Hewlett-Packard Precision Architecture reduced instruction set.


packet 

A group of related items. A packet may refer to the arguments of a subroutine or to a group of bytes that is transmitted over a network.


page 

A page is the unit of virtual or physical memory controlled by the memory management hardware and software. On HP-UX servers, the default page size is 4 K (4,096) contiguous bytes. Valid page sizes are: 4 K, 16 K, 64 K, 256 K, 1 Mbyte, 4 Mbytes, 16 Mbytes, 64 Mbytes, and 256 Mbytes. See also virtual memory.


page fault 

A page fault occurs when a process requests data that is not currently in memory. This requires the operating system to retrieve the page containing the requested data from disk.


page frame 

A page frame is the unit of physical memory in which pages are placed. Referenced and modified bits associated with each page frame aid in memory management.


parallel optimization 

The transformation of source code into parallel code (parallelization) and restructuring of code to enhance parallel performance.


parallelization 

The process of transforming serial code to a form of code that can run simultaneously on multiple CPUs while preserving semantics. When +O3 +Oparallel is specified, the HP compilers automatically parallelize loops in your program and recognize compiler directives and pragmas with which you can manually specify parallelization of loops, tasks, and regions.


parallelization, loop 

The process of splitting a loop into several smaller loops, each of which operates on a subset of the data of the original loop, and generating code to run these loops on separate processors in parallel.


parallelization, ordered 

The process of splitting a loop into several smaller loops, each of which iterates over a subset of the original data with a stride equal to the number of loops created, and generating code to run these loops on separate processors. Each iteration in an ordered parallel loop begins execution in the original iteration order, allowing dependences within the loop to be synchronized to yield correct results via gate constructs.


parallelization, stride-based 

The process of splitting up a loop into several smaller loops, each of which iterates over several discontiguous chunks of data, and generating code to run these loops on separate processors in parallel. Stride-based parallelism can only be achieved manually by using compiler directives.


parallelization, strip-based 

The process of splitting up a loop into several smaller loops, each of which iterates over a single contiguous subset of the data of the original loop, and generating code to run these loops on separate processors in parallel. Strip-based parallelism is the default for automatic parallelism and for directive-initiated loop parallelism in absence of the chunk_size = n or ordered attributes.


parallelization, task 

The process of splitting up source code into independent sections which can safely be run in parallel on available processors. HP programming languages provide compiler directives and pragmas that allow you to identify parallel tasks in source code.


parameter 

In C and C++, either a variable declared in the parameter list of a procedure (function) that receives a value when the procedure is called (formal parameter) or the variable or constant that is passed by a call to a procedure (actual parameter). In Fortran, a symbolic name for a constant.


path 

An environment variable that you set within your shell that allows you to access commands in various directories without having to specify a complete path name.


physical address 

A unique identifier that selects a particular location in the computer's memory. Because HP-UX supports virtual memory, programs address data by its virtual address; HP-UX then maps this address to the appropriate physical address. See also virtual address.


physical address space 

The set of possible addresses for a particular physical memory.


physical memory 

Computer hardware that stores data. V2250 servers can contain up to 16 Gbytes of physical memory on a 16-processor hypernode.


pipeline 

An overlapping operating cycle function that is used to increase the speed of computers. Pipelining provides a means by which multiple operations occur concurrently by beginning one instruction sequence before another has completed. Maximum efficiency is achieved when the pipeline is "full," that is, when all stages are operating on separate instructions.


pipelining 

Issuing instructions in an order that best uses the pipeline.


procedure 

A unit of program code. In Fortran, a function, subroutine, or main program; in C and C++, a function.


process 

A collection of one or more execution streams within a single logical address space; an executable program. A process is made up of one or more threads.


process memory 

The portion of system memory that is used by an executing process.


program unit 

A procedure or main section of a program.


programming model 

A description of the features available to efficiently program a certain computer architecture.


Q

queue 

A data structure in which entries are made at one end and deletions at the other. Often referred to as first-in, first-out (FIFO).


R

rank 

The number of dimensions of an array.


read 

A memory operation in which the contents of a memory location are copied and passed to another part of the system.


recurrence 

A cycle of dependences among the operations within a loop in which an operation in one iteration depends on the result of a following operation that executes in a previous iteration.


recursion 

An operation that is defined, at least in part, by a repeated application of itself.


recursive call 

A condition in which the sequence of instructions in a procedure causes the procedure itself to be invoked again. Such a procedure must be compiled for reentrancy.


reduced instruction set computer (RISC) 

An architectural concept that applies to the definition of the instruction set of a processor. A RISC instruction set is an orthogonal instruction set that is easy to decode in hardware and for which a compiler can generate highly optimized code. The PA-RISC processor used in V2250 servers employ a RISC architecture.


reduction 

An arithmetic operation that performs a transformation on an array to produce a scalar result.


reentrancy 

The ability of a program unit to be executed by multiple threads at the same time. Each invocation maintains a private copy of its local data and a private stack to store compiler-generated temporary variables. Procedures must be compiled for reentrancy in order to be invoked in parallel or to be used for recursive calls. HP compilers compile for reentrancy by default.


reference 

Any operation that requires a cache line to be encached; this includes load as well as store operations, because writing to any element in a cache line requires the entire cache line to be encached.


register 

A hardware entity that contains an address, operand, or instruction status information.


reuse, data 

In the context of a loop, the ability to use data fetched for one loop operation in another operation. In the context of a cache, reusing data that was encached for a previous operation; because data is fetched as part of a cache line, if any of the other items in the cache line are used before the line is flushed to memory, reuse has occurred.


reuse, spatial 

Reusing data that resides in the cache as a result of the fetching of another piece of data from memory. Typically, this involves using array elements that are contiguous to (and therefore part of the cache line of) an element that has already been used, and therefore is already encached.


reuse, temporal 

Reusing a data item that has been used previously.


RISC 

Reduced instruction set computer. An architectural concept that applies to the definition of the instruction set of a processor. A RISC instruction set is an orthogonal instruction set that is easy to decode in hardware and for which a compiler can generate highly optimized code. The
PA-RISC processor used in V2250 servers employs a RISC architecture.


rounding 

A method of obtaining a representation of a number that has less precision than the original in which the closest number representable under the lower precision system is used.


row-major order 

Memory representation of an array such that the rows of an array are stored contiguously. For example, given a two-dimensional array A[3][4], array element A[0][3] immediately precedes A[1][0] in memory. This is the default storage method for arrays in C.


S

scope 

The domain in which a variable is visible in source code. The rules that determine scope are different for Fortran and C/C++.


semaphore 

An integer variable assigned one of two values: one value to indicate that it is "locked," and another to indicate that it is "free." Semaphores can be used to synchronize parallel threads. Pthreads provides a set of manipulation functions to facilitate this.


shape 

The number of elements in each dimension of an array.


shared virtual memory 

A memory architecture in which memory can be accessed by all processors in the system. This architecture can also support virtual memory.


shell 

An interactive command interpreter that is the interface between the user and the Unix operating system.


SIMD (single instruction stream multiple data stream) 

A computer architecture that performs one operation on multiple sets of data. A processor (separate from the SMP array) is used for the control logic, and the processors in the SMP array perform the instruction on the data. Compare with MIMD (multiple instruction stream multiple data stream).


single 

A single-precision floating-point number stored in 32 bits. See also double.


SMP 

Symmetric multiprocessor. A multiprocessor computer in which all the processors have equal access to all machine resources. Symmetric multiprocessors have no manager or worker processors; the operating system runs on any or all of the processors.


socket 

An endpoint used for interprocess communication.


socket pair 

Bidirectional pipes that enable application programs to set up two-way communication between processes that share a common ancestor.


source code 

The uncompiled version of a program, written in a high-level language such as Fortran or C.


source file 

A file that contains program source code.


space 

A contiguous range of virtual addresses within the system-wide virtual address space. Spaces are 16 Tbytes in the V2250 servers.


spatial reference 

An attribute of a memory reference pattern that pertains to the likelihood of a subsequent memory reference address being numerically close to a previously referenced address.


spawn 

To activate existing threads.


spawn context 

A parallel loop, task list, or region that initiates the spawning of threads and defines the structure within which the threads' spawn thread IDs are valid.


spawn thread identifier (stid) 

A sequential integer identifier associated with a particular thread that has been spawned. stids are only assigned to spawned threads, and they are assigned within a spawn context; therefore, duplicate stids may be present amongst the threads of a program, but stids are always unique within the scope of their spawn context. stids are assigned sequentially and run from 0 to one less than the number of threads spawned in a particular spawn context.


SPMD  

Single program multiple data. A single program executing simultaneously on several processors. This is usually taken to mean that there is redundant execution of sequential scalar code on all processors.


stack 

A data structure in which the last item entered is the first to be removed. Also referred to as last-in, first-out (LIFO). HP-UX provides every thread with a stack which is used to pass arguments to functions and subroutines and for local variable storage.


store 

An instruction used to move the contents of a register to memory.


strip length, parallel 

In strip-based parallelism, the amount by which the induction variable of a parallel inner loop is advanced on each iteration of the (conceptual) controlling outer loop.


strip mining 

The transformation of a single loop into two nested loops. Conceptually, this is how parallel loops are created by default. A conceptual outer loop advances the initial value of the inner loop's induction variable by the parallel strip length. The parallel strip length is based on the trip count of the loop and the amount of code in the loop body. Strip mining is also used by the data localization optimization.


subroutine 

A software module that can be invoked from anywhere in a program.


superscalar 

A class of RISC processors that allow multiple instructions to be issued in each clock period.


Symmetric Multiprocessor (SMP) 

A multiprocessor computer in which all the processors have equal access to all machine resources. Symmetric multiprocessors have no manager or worker processors; the operating system runs on any or all of the processors.


synchronization 

A method of coordinating the actions of multiple threads so that operations occur in the right sequence. When manually optimizing code, you can synchronize programs using compiler directives, calls to library routines, or assembly-language instructions. You do so, however, at the cost of additional overhead; synchronization may cause at least one CPU to wait for another.


system administrator (sysadmin) 

The person responsible for managing the administration of a system.


system manager 

The person responsible for the management and operation of a computer system. Also called the system administrator and the sysadmin.


T

Tbyte 

See terabyte (Tbyte).


terabyte (Tbyte) 

1099511627776 (240) bytes.


term 

A constant or symbolic name that is part of an expression.


thread 

An independent execution stream that is executed by a CPU. One or more threads, each of which can execute on a different CPU, make up each process. Memory, files, signals, and other process attributes are generally shared among threads in a given process, enabling the threads to cooperate in solving the common problem. Threads are created and terminated by instructions that can be automatically generated by HP compilers, inserted by adding compiler directives to source code, or coded explicitly using library calls or assembly-language.


thread create 

To activate existing threads.


thread identifier 

An integer identifier associated with a particular thread. See thread identifier, kernel (ktid) and thread identifier, spawn (stid).


thread identifier, kernel (ktid) 

A unique integer identifier (not necessarily sequential) assigned when a thread is created.


thread identifier, spawn (stid) 

A sequential integer identifier associated with a particular thread that has been spawned. stids are only assigned to spawned threads, and they are assigned within a spawn context; therefore, duplicate stids may be present amongst the threads of a program, but stids are always unique within the scope of their spawn context. stids are assigned sequentially and run from 0 to one less than the number of threads spawned in a particular spawn context.


thread-private memory 

Data that is accessible by a single thread only (not shared among the threads constituting a process).


TLB  

See translation lookaside buffer.


translation lookaside buffer 

A hardware entity that contains information necessary to translate a virtual memory reference to the corresponding physical page and to validate memory accesses.


trip count 

The number of iterations a loop executes.


U

unsigned 

A value that is always positive.


user interface 

The portion of a computer program that processes input entered by a human and provides output for human users.


utility 

A software tool designed to perform a frequently used support function.


V

vector 

An ordered list of items in a computer's memory, contained within an array. A simple vector is defined as having a starting address, a length, and a stride. An indirect address vector is defined as having a relative base address and a vector of values to be applied as offsets to the base.


vector processor 

A processor whose instruction set includes instructions that perform operations on a vector of data (such as a row or column of an array) in an optimized fashion.


virtual address 

The address by which programs access their data. HP-UX maps this address to the appropriate physical memory address. See also space.


virtual aliases 

Two different virtual addresses that map to the same physical memory address.


virtual machine 

A collection of computing resources configured so that a user or process can access any of the resources, regardless of their physical location or operating system, from a single interface.


virtual memory 

The memory space as seen by the program, which is typically larger than the available physical memory. The virtual memory of a V2250 server can be up to 16 Tbytes. The operating system maps this virtual memory to a smaller set of physical memory, using disk space to make up the difference if necessary. Also called logical memory.


W

wall-clock time 

The chronological time an application requires to complete its processing. If an application starts running at 1:00 p.m. and finishes at 5:00 a.m. the following morning, its wall-clock time is sixteen hours. Compare with CPU time.


word 

A contiguous group of bytes that make up a primitive data operand and start on an addressable boundary. In V2250 servers a word is four bytes (32 bits) in length. See also doubleword.


workstation 

A stand-alone computer that has its own processor, memory, and possibly a disk drive and can typically sit on a user's desk.


write 

A memory operation in which a memory location is updated with new data.


Z

zero 

In floating-point number representations, zero is represented by the sign bit with a value of zero and the exponent with a value of zero.


Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.