A |
|---|
| absolute address | | An address that does not undergo virtual-to-physical
address translation when used to reference memory or the I/O
register area.
|
|---|
| accumulator | | A variable used to accumulate value. Accumulators
are typically assigned a function of themselves, which can create
dependences when done in loops.
|
|---|
| actual argument | | In Fortran, a value that is passed by a call to
a procedure (function or subroutine). The actual argument appears
in the source of the calling procedure; the argument that appears
in the source of the called procedure is a dummy argument.
C and C++ conventions refer to actual arguments
as actual parameters.
|
|---|
| actual parameter | | In C and C++, a value that is
passed by a call to a procedure (function). The actual parameter
appears in the source of the calling procedure; the parameter that
appears in the source of the called procedure is a formal
parameter. Fortran conventions refer to actual parameters
as actual arguments.
|
|---|
| address | | A number used by the operating system to identify
a storage location.
|
|---|
| address space | | Memory space, either physical or virtual, available
to a process.
|
|---|
| alias | | An alternative name for some object, especially
an alternative variable name that refers to a memory location. Aliases
can cause data dependences, which prevent the compiler from parallelizing
parts of a program.
|
|---|
| alignment | | A condition in which the address, in memory, of
a given data item is integrally divisible by a particular integer
value, often the size of the data item itself. Alignment simplifies
the addressing of such data items.
|
|---|
| allocatable array | | In Fortran 90, a named array whose rank is specified
at compile time, but whose bounds are determined at run time.
|
|---|
| allocate | | An action performed by a program at runtime in which
memory is reserved to hold data of a given type. In Fortran 90,
this is done through the creation of allocatable arrays.
In C, it is done through the dynamic creation of memory blocks using
malloc. In C++,
it is done through the dynamic creation of memory blocks using malloc
or new.
|
|---|
| ALU | | Arithmetic logic unit. A basic element of the central
processing unit (CPU) where arithmetic and logical operations are
performed.
|
|---|
| Amdahl's law | | A statement that the ultimate performance of a computer
system is limited by the slowest component. In the context of HP
servers this is interpreted to mean that the serial component of
the application code will restrict the maximum speed-up that is
achievable.
|
|---|
| American National Standards Institute (ANSI) | | A repository and coordinating agency for standards
implemented in the U.S. Its activities include the production of
Federal Information Processing (FIPS) standards for the Department
of Defense (DoD).
|
|---|
| ANSI | | See American National Standards Institute.
|
|---|
| apparent recurrence | | A condition or construct that fails to provide the
compiler with sufficient information to determine whether or not
a recurrence exists. Also called a potential recurrence.
|
|---|
| argument | | In Fortran, either a variable declared in the argument
list of a procedure (function or subroutine) that receives a value
when the procedure is called (dummy argument)
or the variable or constant that is passed by a call to a procedure
(actual argument). C and C++
conventions refer to arguments as parameters.
|
|---|
| arithmetic logic unit (ALU) | | A basic element of the central processing unit (CPU)
where arithmetic and logical operations are performed.
|
|---|
| array | | An ordered structure of operands of the same data
type. The structure of an array is defined by its rank, shape, and
data type.
|
|---|
| array section | | A Fortran 90 construct that defines a subset of
an array by providing starting and ending elements and strides for
each dimension. For an array A(4,4),
A(2:4:2,2:4:2) is an array section
containing only the evenly indexed elements A(2,2),
A(4,2), A(2,4),
and A(4,4).
|
|---|
| array-valued argument | | In Fortran 90, an array section
that is an actual argument to a subprogram.
|
|---|
| ASCII | | American Standard Code for Information Interchange.
This encodes printable and non-printable characters into a range
of integers.
|
|---|
| assembler | | A program that converts assembly language programs
into executable machine code.
|
|---|
| assembly language | | A programming language whose executable statements
can each be translated directly into a corresponding machine instruction
of a particular computer system.
|
|---|
| automatic array | | In Fortran, an array of explicit rank that is not
a dummy argument and is declared in a subprogram.
|
|---|
B |
|---|
| bandwidth | | A measure of the rate at which data can be moved
through a device or circuit. Bandwidth is usually measured in millions
of bytes per second (Mbytes/sec) or millions of bits per
second (Mbits/sec).
|
|---|
| bank conflict | | An attempt to access a particular memory bank before
a previous access to the bank is complete, or when the bank is not
yet finished recycling (i.e., refreshing).
|
|---|
| barrier | | A structure used by the compiler in barrier synchronization.
Also sometimes used to refer to the construct used to implement
barrier synchronization. See also barrier synchronization.
|
|---|
| barrier synchronization | | A control mechanism used in parallel programming
that ensures all threads have completed an operation before continuing
execution past the barrier in sequential mode. On HP servers, barrier
synchronization can be automated by certain CPSlib routines and
compiler directives. See also barrier.
|
|---|
| basic block | | A linear sequence of machine instructions with a
single entry and a single exit.
|
|---|
| bit | | A binary digit.
|
|---|
| blocking factor | | Integer representing the stride of the outer strip
of a pair of loops created by blocking.
|
|---|
| branch | | A class of instructions which change the value of
the program counter to a value other than that of the next sequential
instruction.
|
|---|
| byte | | A group of contiguous bits starting on an addressable
boundary. A byte is 8 bits in length.
|
|---|
C |
|---|
| cache | | A small, high-speed buffer memory used in modern
computer systems to hold temporarily those portions of the contents
of the memory that are, or are believed to be, currently in use.
Cache memory is physically separate from main memory and can be
accessed with substantially less latency. HP servers employ separate
data and instruction cache memories.
|
|---|
| cache hit | | A cache hit occurs if data
to be loaded is residing in the cache.
|
|---|
| cache line | | A chunk of contiguous data that is copied into a
cache in one operation. On V2250 servers, processor cache lines
are 32 bytes
|
|---|
| cache memory | | A small, high-speed buffer memory used in modern
computer systems to hold temporarily those portions of the contents
of the memory that are, or are believed to be, currently in use.
Cache memory is physically separate from main memory and can be
accessed with substantially less latency. V2250 servers employ separate
data and instruction caches.
|
|---|
| cache miss | | A cache miss occurs if data
to be loaded is not residing in the cache.
|
|---|
| cache purge | | The act of invalidating or removing entries in a
cache memory.
|
|---|
| cache thrashing | | Cache thrashing occurs when
two or more data items that are frequently needed by the program
map to the same cache address. In this case, each time one of the
items is encached it overwrites another needed item, causing constant
cache misses and impairing data reuse. Cache thrashing also occurs
when two or more threads are simultaneously writing to the same
cache line.
|
|---|
| cache, direct mapped | | A form of cache memory that addresses encached data
by a function of the data's virtual address. On V2250 servers,
the processor cache address is identical to the least-significant
21 bits of the data's virtual address. This means cache
thrashing can occur when the virtual addresses of two data items
are an exact multiple of 2 Mbyte (21 bits) apart.
|
|---|
| central processing unit (CPU) | | The central processing unit (CPU) is that portion
of a computer that recognizes and executes the instruction set.
|
|---|
| clock cycle | | The duration of the square wave pulse sent throughout
a computer system to synchronize operations.
|
|---|
| clone | | A compiler-generated copy of a loop or procedure.
When the HP compilers generate code for a parallelizable loop, they
generate two versions: a serial clone and a parallel clone. See
also dynamic selection.
|
|---|
| code | | A computer program, either in source form or in
the form of an executable image on a machine.
|
|---|
| coherency | | A term frequently applied to caches. If a data item
is referenced by a particular processor on a multiprocessor system,
the data is copied into that processor's cache and is updated
there if the processor modifies the data. If another processor references
the data while a copy is still in the first processor's
cache, a mechanism is needed to ensure that the second processor
does not use an outdated copy of the data from memory. The state
that is achieved when both processors' caches always have
the latest value for the data is called cache coherency. On multiprocessor
servers an item of data may reside concurrently in several
processors' caches.
|
|---|
| column-major order | | Memory representation of an array such that the
columns are stored contiguously. For example, given a two-dimensional
array A(3,4), the array element
A(3,1) immediately precedes element
A(1,2) in memory. This is the default
storage method for arrays in Fortran.
|
|---|
| compiler | | A computer program that translates computer code
written in a high-level programming language, such as Fortran, into
equivalent machine language.
|
|---|
| concurrent | | In parallel processing, threads that can execute
at the same time are called concurrent threads.
|
|---|
| conditional induction variable | | A loop induction variable that is not necessarily
incremented on every iteration.
|
|---|
| constant folding | | Replacement of an operation on constant operands
with the result of the operation.
|
|---|
| constant propagation | | The automatic compile-time replacement of variable
references with a constant value previously assigned to that variable.
Constant propagation is performed within a single procedure by conventional
compilers.
|
|---|
| conventional compiler | | A compiler that cannot perform interprocedural optimization.
|
|---|
| counter | | A variable that is used to count the number of times
an operation occurs.
|
|---|
| CPA | | CPU Agent. The gate array on V2250 servers that
provides a high-speed interface between pairs of PA-RISC processors
and the crossbar. Also called the CPU
Agent and the agent.
|
|---|
| CPU | | Central processing unit. The central processing
unit (CPU) is that portion of a computer that recognizes and executes
the instruction set.
|
|---|
| CPU Agent | | The gate array on V2250 servers that provides a
high-speed interface between pairs of PA-RISC processors and the
crossbar.
|
|---|
| CPU time | | The amount of time the CPU requires to execute a
program. Because programs share access to a CPU, the wall-clock
time of a program may not be the same as its CPU time. If a program
can use multiple processors, the CPU time may be greater than the
wall-clock time. (See wall-clock time.)
|
|---|
| CPU-private memory | | Data that is accessible by a single thread only
(not shared among the threads constituting a process). A thread-private
data object has a unique virtual address which maps to a unique
physical address. Threads access the physical copies of thread-private
data residing on their own hypernode when they access thread-private
virtual addresses.
|
|---|
| critical section | | A portion of a parallel program that can be executed
by only one thread at a time.
|
|---|
| crossbar | | A switching device that connects the CPUs, banks
of memory, and I/O controller on a single hypernode of
a V2250 server. Because the crossbar is nonblocking, all ports can
run at full bandwidth simultaneously, provided there is not contention
for a particular port.
|
|---|
| CSR | | Control/Status Register. A CSR is a software-addressable
hardware register used to hold control information or state.
|
|---|
D |
|---|
| data cache (Dcache) | | A small cache memory with a fast access time. This
cache holds prefetched and current data. On V2250 servers, processors
have 2-Mbyte off-chip caches. See also cache, direct mapped.
|
|---|
| data dependence | | A relationship between two statements in a program,
such that one statement must precede the other to produce the intended
result. (See also loop-carried dependence (LCD)
and loop-independent dependence (LID).)
|
|---|
| data localization | | Optimizations designed to keep frequently used data
in the processor data cache, thus eliminating the need for more
costly memory accesses.
|
|---|
| data type | | A property of a data item that determines how its
bits are grouped and interpreted. For processor instructions, the
data type identifies the size of the operand and the significance
of the bits in the operand. Some example data types include INTEGER,
int, REAL,
and float.
|
|---|
| Dcache | | Data cache. A small cache memory with a one clock
cycle access time under pipelined conditions. This cache holds prefetched
and current data.On V2250 servers, this cache is 2 Mbytes.
|
|---|
| deadlock | | A condition in which a thread waits indefinitely
for some condition or action that cannot, or will not, occur.
|
|---|
| direct memory access (DMA) | | A method for gaining direct access to memory and
achieving data transfers without involving the CPU.
|
|---|
| distributed memory | | A memory architecture used in multi-CPU systems,
in which the system's memory is physically divided among
the processors. In most distributed-memory architectures, memory
is accessible from the single processor that owns it. Sharing of
data requires explicit message passing.
|
|---|
| distributed part | | A loop generated by the compiler in the process
of loop distribution.
|
|---|
| DMA | | Direct memory access. A method for gaining direct
access to memory and achieving data transfers without involving
the CPU.
|
|---|
| double | | A double-precision floating-point number that is
stored in 64 bits in C and C++.
|
|---|
| doubleword | | A primitive data operand which is 8 bytes (64 bits)
in length. Also called a longword. See also
word.
|
|---|
| dummy argument | | In Fortran, a variable declared in the argument
list of a procedure (function or subroutine) that receives a value
when the procedure is called. The dummy argument appears in the
source of the called procedure; the parameter that appears in the
source of the calling procedure is an actual argument.
C and C++ conventions refer to dummy arguments
as formal parameters.
|
|---|
| dynamic selection | | The process by which the compiler chooses the appropriate
runtime clone of a loop. See also clone.
|
|---|
E |
|---|
| encache | | To copy data or instructions into a cache.
|
|---|
| exception | | A hardware-detected event that interrupts the running
of a program, process, or system. See also fault.
|
|---|
| execution stream | | A series of instructions executed by a CPU.
|
|---|
F |
|---|
| fault | | A type of interruption caused
by an instruction requesting a legitimate action that cannot be
carried out immediately due to a system problem.
|
|---|
| floating-point | | A numerical representation of a real number. On
V2250 servers, a floating point operand has a sign (positive or
negative) part, an exponent part, and a fraction part. The fraction
is a fractional representation. The exponent is the value used to
produce a power of two scale factor (or portion) that is subsequently
used to multiply the fraction to produce an unsigned value.
|
|---|
| FLOPS | | Floating-point operations per second. A standard
measure of computer processing power in the scientific community.
|
|---|
| formal parameter | | In C and C++, a variable declared
in the parameter list of a procedure (function) that receives a
value when the procedure is called. The formal parameter appears
in the source of the called procedure; the parameter that appears
in the source of the calling procedure is an actual parameter. Fortran
conventions refer to formal parameters as dummy arguments.
|
|---|
| Fortran | | A high-level software language used mainly for scientific
applications.
|
|---|
| Fortran 90 | | The international standard for Fortran adopted in
1991.
|
|---|
| function | | A procedure whose call can be imbedded within another
statement, such as an assignment or test. Any procedure in C or
C++ or a procedure defined as a FUNCTION in Fortran.
|
|---|
| functional unit (FU) | | A part of a CPU that performs a set of operations
on quantities stored in registers.
|
|---|
G |
|---|
| gate | | A construct that restricts execution of a block
of code to a single thread. A thread locks a gate on entering the
gated block of code and unlocks the gate on exiting the block. When
the gate is locked, no other threads can enter. Compiler directives
can be used to automate gate constructs; gates can also be implemented
using semaphores.
|
|---|
| Gbyte | | See gigabyte.
|
|---|
| gigabyte | | 1073741824 (230) bytes.
|
|---|
| global optimization | | A restructuring of program statements that is not
confined to a single basic block. Global optimization, unlike interprocedural
optimization, is confined to a single procedure. Global optimization
is done by HP compilers at optimization level +O2
and above.
|
|---|
| global register allocation (GRA) | | A method by which the compiler attempts to store
commonly-referenced scalar variables in registers throughout the
code in which they are most frequently accessed.
|
|---|
| global variable | | A variable whose scope is greater than a single
procedure. In C and C++ programs, a global variable
is a variable that is defined outside of any one procedure. Fortran
has no global variables per se, but COMMON
blocks can be used to make certain memory locations globally accessible.
|
|---|
| granularity | | In the context of parallelism, a measure of the
relative size of the computation done by a thread or parallel construct.
Performance is generally an increasing function of the granularity.
In higher-level language programs, possible sizes are routine, loop,
block, statement, and expression. Fine granularity can be exhibited
by parallel loops, tasks and expressions, Coarse granularity can
be exhibited by parallel processes.
|
|---|
H |
|---|
| hand-rolled loop | | A loop, more common in Fortran than C or C++,
that is constructed using IF tests
and GOTO statements rather than
a language-provided loop structure such as DO.
|
|---|
| hidden alias | | An alias that, because of the structure of a program
or the standards of the language, goes undetected by the compiler.
Hidden aliases can result in undetected data dependences,
which may result in wrong answers.
|
|---|
| High Performance Fortran (HPF) | | An ad-hoc language extension of Fortran 90 that
provides user-directed data distribution and alignment. HPF is not
a standard, but rather a set of features desirable for parallel
programming.
|
|---|
| hoist | | An optimization process that moves a memory load
operation from within a loop to the basic block preceding the loop.
|
|---|
| HP | | Hewlett-Packard, the manufacturer of the PA-RISC
chips used as processors in V2250 servers.
|
|---|
| HP-UX | | Hewlett-Packard's Unix-based operating
system for its PA-RISC workstations and servers.
|
|---|
| hypercube | | A topology used in some massively parallel processing
systems. Each processor is connected to its binary neighbors. The
number of processors in the system is always a power of two; that
power is referred to as the dimension of the hypercube. For example,
a 10-dimensional hypercube has 210, or
1,024 processors.
|
|---|
| hypernode | | A set of processors and physical memory organized
as a symmetric multiprocessor (SMP) running a single image of the
operating system. Nonscalable servers and V2250 servers consist
of one hypernode. When discussing multidimensional parallelism or
memory classes, hypernodes are generally called nodes.
|
|---|
I |
|---|
| Icache | | Instruction cache. This cache holds prefetched instructions
and permits the simultaneous decoding of one instruction with the
execution of a previous instruction. On V2250 servers, this cache
is 2 Mbytes.
|
|---|
| IEEE | | Institute for Electrical and Electronic Engineers.
An international professional organization and a member of ANSI
and ISO.
|
|---|
| induction variable | | A variable that changes linearly within the loop,
that is, whose value is incremented by a constant amount on every
iteration. For example, in the following Fortran loop, I,
J and K
are induction variables, but L
is not. DO I = 1, N J = J + 2 K = K + N L = L + I ENDDO
|
|---|
| inlining | | The replacement of a procedure (function or subroutine)
call, within the source of a calling procedure, by a copy of the
called procedure's code.
|
|---|
| Institute for Electrical and Electronic Engineers
(IEEE) | | An international professional organization and a
member of ANSI and ISO.
|
|---|
| instruction | | One of the basic operations performed by a CPU.
|
|---|
| instruction cache (Icache) | | This cache holds prefetched instructions and permits
the simultaneous decoding of one instruction with the execution
of a previous instruction. On V2250 servers, this cache is 2 Mbytes.
|
|---|
| instruction mnemonic | | A symbolic name for a machine instruction.
|
|---|
| integral division | | Division that results in a whole number solution
with no remainder. For example, 10 is integrally divisible by 2,
but not by 3.
|
|---|
| interface | | A logical path between any two modules or systems.
|
|---|
| interleaved memory | | Memory that is divided into multiple banks to permit
concurrent memory accesses. The number of separate memory banks
is referred to as the memory stride.
|
|---|
| interprocedural optimization | | Automatic analysis of relationships and interfaces
between all subroutines and data structures within a program. Traditional
compilers analyze only the relationships within the procedure being
compiled.
|
|---|
| interprocessor communication | | The process of moving or sharing data, and synchronizing
operations between processors on a multiprocessor system.
|
|---|
| intrinsic | | A function or subroutine that is an inherent part
of a computer language. For example, SIN
is a Fortran intrinsic.
|
|---|
J |
|---|
| job scheduler | | That portion of the operating system that schedules
and manages the execution of all processes.
|
|---|
| join | | The synchronized termination of parallel execution
by spawned tasks or threads.
|
|---|
| jump | | Departure from normal one-step incrementing of the
program counter.
|
|---|
K |
|---|
| kbyte | | See kilobyte.
|
|---|
| kernel | | The core of the operating system where basic system
facilities, such as file access and memory management functions,
are performed.
|
|---|
| kernel thread identifier (ktid) | | A unique integer identifier (not necessarily sequential)
assigned when a thread is created.
|
|---|
| kilobyte | | 1024 (210) bytes.
|
|---|
L |
|---|
| latency | | The time delay between the issuing of an instruction
and the completion of the operation. A common benchmark used for
comparing systems is the latency of coherent memory access instructions.
This particular latency measurement is believed to be a good indication
of the scalability of a system; low latency
equates to low system overhead as system size increases.
|
|---|
| linker | | A software tool that combines separate object code
modules into a single object code module or executable program.
|
|---|
| load | | An instruction used to move the contents of a memory
location into a register.
|
|---|
| local optimization | | Restructuring of program statements within the scope
of a basic block. Local optimization is done by HP compilers at
optimization level +O1 and above.
|
|---|
| locality of reference | | An attribute of a memory reference pattern that
refers to the likelihood of an address of a memory reference being
physically close to the CPU making the reference.
|
|---|
| localization | | Data localization. Optimizations designed to keep
frequently used data in the processor data cache, thus eliminating
the need for more costly memory accesses.
|
|---|
| logical address | | Logical address space is that address as seen by
the application program.
|
|---|
| logical memory | | Virtual memory. The memory space as seen by the
program, which may be larger than the available physical memory.
The virtual memory of a V2250 server can be up to 16 Tbytes. HP-UX
can map this virtual memory to a smaller set of physical memory,
using disk space to make up the difference if necessary. Also called
virtual memory.
|
|---|
| longword (l) | | Doubleword. A primitive data operand which is 8
bytes (64 bits) in length. See also word.
|
|---|
| loop blocking | | A loop transformation that strip mines and interchanges
a loop to provide optimal reuse of the encachable loop data.
|
|---|
| loop constant | | A constant or expression whose value does not change
within a loop.
|
|---|
| loop distribution | | The restructuring of a loop nest to create simple
loop nests. Loop distribution creates two or more loops, called
distributed parts, which can serve to make parallelization more
efficient by increasing the opportunities for loop interchange and
isolating code that must run serially from parallelizable code.
It can also improve data localization and other optimizations.
|
|---|
| loop induction variable | | See induction variable.
|
|---|
| loop interchange | | The reordering of nested loops. Loop interchange
is generally done to increase the granularity of the parallelizable
loop(s) present or to allow more efficient access to loop data.
|
|---|
| loop invariant | | Loop constant. A constant or expression whose value
does not change within a loop.
|
|---|
| loop invariant computation | | An operation that yields the same result on every
iteration of a loop.
|
|---|
| loop replication | | The process of transforming one loop into more than
one loop to facilitate an optimization. The optimizations that replicate
loops are IF-DO and if-for
optimizations, dynamic selection, loop unrolling, and loop blocking.
|
|---|
| loop-carried dependence (LCD) | | A dependence between two operations executed on
different iterations of a given loop and on the same iteration of
all enclosing loops. A loop carries a dependence from an indexed
assignment to an indexed use if, for some iteration of the loop,
the assignment stores into an address that is referred to on a different
iteration of the loop.
|
|---|
| loop-independent dependence (LID) | | A dependence between two operations executed on
the same iteration of all enclosing loops such that one operation
must precede the other to produce correct results.
|
|---|
M |
|---|
| machine exception | | A fatal error in the system that cannot be handled
by the operating system. See also exception.
|
|---|
| main memory | | Physical memory other than what the processor caches.
|
|---|
| main procedure | | A procedure invoked by the operating system when
an application program starts up. The main procedure is the main
program in Fortran; in C and C++, it is the function
main().
|
|---|
| main program | | In a Fortran program, the program section invoked
by the operating system when the program starts up.
|
|---|
| Mbyte | | See megabyte (Mbyte).
|
|---|
| megabyte (Mbyte) | | 1048576 (220) bytes.
|
|---|
| megaflops (MFLOPS) | | One million floating-point operations per second.
|
|---|
| memory bank conflict | | An attempt to access a particular memory bank before
a previous access to the bank is complete, or when the bank is not
yet finished recycling (i.e., refreshing).
|
|---|
| memory management | | The hardware and software that control memory page
mapping and memory protection.
|
|---|
| message | | Data copied from one process to another (or the
same) process. The copy is initiated by the sending process, which
specifies the receiving process. The sending and receiving processes
need not share a common address space. (Note: depending on the context,
a process may be a thread.)
|
|---|
| message passing | | A type of programming in which program modules (often
running on different processors or different hosts) communicate
with each other by means of system library calls that package, transmit,
and receive data. All message-passing library calls must be explicitly
coded by the programmer.
|
|---|
| Message-Passing Interface (MPI) | | A message-passing and process control library. For
information on the Hewlett-Packard implementation of MPI, refer
to the HP MPI User's Guide (B6011-90001).
|
|---|
| MIMD (multiple instruction stream multiple
data stream) | | A computer architecture that uses multiple processors,
each processing its own set of instructions simultaneously and independently
of others. MIMD also describes when processes are performing different
operations on different data. Compare with SIMD.
|
|---|
| multiprocessing | | The creation and scheduling of processes on any
subset of CPUs in a system configuration.
|
|---|
| mutex | | A variable used to construct an area (region of
code) of mutual exclusion. When a mutex is
locked, entry to the area is prohibited; when the mutex is free,
entry is allowed.
|
|---|
| mutual exclusion | | A protocol that prevents access to a given resource
by more than one thread at a time.
|
|---|
N |
|---|
| negate | | An instruction that changes the sign of a number.
|
|---|
| network | | A system of interconnected computers that enables
machines and their users to exchange information and share resources.
|
|---|
| node | | On HP scalable and nonscalable servers, a node is
equivalent to a hypernode. The term "node"
is generally used in place of hypernode.
|
|---|
| non-uniform memory access (NUMA) | | This term describes memory access times in systems
in which accessing different types of memory (for example, memory
local to the current hypernode or memory remote to the current hypernode)
results in non-uniform access times.
|
|---|
| nonblocking crossbar | | A switching device that connects the CPUs, banks
of memory, and I/O controller on a single hypernode. Because
the crossbar is nonblocking, all ports can run at full bandwidth
simultaneously provided there is not contention for a particular
port.
|
|---|
| NUMA | | Non-uniform memory access. This term describes memory
access times in systems in which accessing different types of memory
(for example, memory local to the current hypernode or memory remote
to the current hypernode) results in non-uniform access times.
|
|---|
O |
|---|
| offset | | In the context of a process address space, an integer
value that is added to a base address to calculate a memory address.
Offsets in V2250 servers are 64-bit values, and must keep address
values within a single 16-Tbyte memoryspace.
|
|---|
| opcode | | A predefined sequence of bits in an instruction
that specifies the operation to be performed.
|
|---|
| operating system | | The program that manages the resources of a computer
system. V2250 servers use the HP-UX operating system.
|
|---|
| optimization | | The refining of application software programs to
minimize processing time. Optimization takes maximum advantage of
a computer's hardware features and minimizes idle processor
time.
|
|---|
| optimization level | | The degree to which source code is optimized by
the compiler. The HP compilers offer five levels of optimization:
level +O0, +O1,
+O2, +O3,
and +O4. The +O4
option is not available in Fortran 90.
|
|---|
| oversubscript | | An array reference that falls outside declared bounds.
|
|---|
| oversubscription | | In the context of parallel threads, a process attribute
that permits the creation of more threads within a process than
the number of processors available to the process.
|
|---|
P |
|---|
| PA-RISC | | The Hewlett-Packard Precision Architecture reduced
instruction set.
|
|---|
| packet | | A group of related items. A packet may refer to
the arguments of a subroutine or to a group of bytes that is transmitted
over a network.
|
|---|
| page | | A page is the unit of virtual or physical memory
controlled by the memory management hardware and software. On HP-UX
servers, the default page size is 4 K (4,096) contiguous bytes.
Valid page sizes are: 4 K, 16 K, 64 K, 256 K,
1 Mbyte, 4 Mbytes, 16 Mbytes, 64 Mbytes, and 256 Mbytes.
See also virtual memory.
|
|---|
| page fault | | A page fault occurs when a process requests data
that is not currently in memory. This requires the operating system
to retrieve the page containing the requested data from disk.
|
|---|
| page frame | | A page frame is the unit of physical memory in which
pages are placed. Referenced and modified bits associated with each
page frame aid in memory management.
|
|---|
| parallel optimization | | The transformation of source code into parallel
code (parallelization) and restructuring of code to enhance parallel
performance.
|
|---|
| parallelization | | The process of transforming serial code to a form
of code that can run simultaneously on multiple CPUs while preserving
semantics. When +O3 +Oparallel
is specified, the HP compilers automatically parallelize loops in
your program and recognize compiler directives and pragmas with
which you can manually specify parallelization of loops, tasks,
and regions.
|
|---|
| parallelization, loop | | The process of splitting a loop into several smaller
loops, each of which operates on a subset of the data of the original
loop, and generating code to run these loops on separate processors
in parallel.
|
|---|
| parallelization, ordered | | The process of splitting a loop into several smaller
loops, each of which iterates over a subset of the original data
with a stride equal to the number of loops created, and generating
code to run these loops on separate processors. Each iteration in
an ordered parallel loop begins execution in the original iteration
order, allowing dependences within the loop to be synchronized to
yield correct results via gate constructs.
|
|---|
| parallelization, stride-based | | The process of splitting up a loop into several
smaller loops, each of which iterates over several discontiguous
chunks of data, and generating code to run these loops on separate
processors in parallel. Stride-based parallelism can only be achieved
manually by using compiler directives.
|
|---|
| parallelization, strip-based | | The process of splitting up a loop into several
smaller loops, each of which iterates over a single contiguous subset
of the data of the original loop, and generating code to run these
loops on separate processors in parallel. Strip-based parallelism
is the default for automatic parallelism and for directive-initiated
loop parallelism in absence of the chunk_size = n
or ordered attributes.
|
|---|
| parallelization, task | | The process of splitting up source code into independent
sections which can safely be run in parallel on available processors.
HP programming languages provide compiler directives and pragmas
that allow you to identify parallel tasks in source code.
|
|---|
| parameter | | In C and C++, either a variable
declared in the parameter list of a procedure (function) that receives
a value when the procedure is called (formal parameter)
or the variable or constant that is passed by a call to a procedure
(actual parameter). In Fortran, a symbolic
name for a constant.
|
|---|
| path | | An environment variable that you set within your
shell that allows you to access commands in various directories
without having to specify a complete path name.
|
|---|
| physical address | | A unique identifier that selects a particular location
in the computer's memory. Because HP-UX supports virtual
memory, programs address data by its virtual address; HP-UX then
maps this address to the appropriate physical address. See also
virtual address.
|
|---|
| physical address space | | The set of possible addresses for a particular physical
memory.
|
|---|
| physical memory | | Computer hardware that stores data. V2250 servers
can contain up to 16 Gbytes of physical memory on a 16-processor
hypernode.
|
|---|
| pipeline | | An overlapping operating cycle function that is
used to increase the speed of computers. Pipelining provides a means
by which multiple operations occur concurrently by beginning one
instruction sequence before another has completed. Maximum efficiency
is achieved when the pipeline is "full," that
is, when all stages are operating on separate instructions.
|
|---|
| pipelining | | Issuing instructions in an order that best uses
the pipeline.
|
|---|
| procedure | | A unit of program code. In Fortran, a function,
subroutine, or main program; in C and C++, a function.
|
|---|
| process | | A collection of one or more execution streams within
a single logical address space; an executable program. A process
is made up of one or more threads.
|
|---|
| process memory | | The portion of system memory that is used by an
executing process.
|
|---|
| program unit | | A procedure or main section of a program.
|
|---|
| programming model | | A description of the features available to efficiently
program a certain computer architecture.
|
|---|
Q |
|---|
| queue | | A data structure in which entries are made at one
end and deletions at the other. Often referred to as first-in, first-out
(FIFO).
|
|---|
R |
|---|
| rank | | The number of dimensions of an array.
|
|---|
| read | | A memory operation in which the contents of a memory
location are copied and passed to another part of the system.
|
|---|
| recurrence | | A cycle of dependences among the operations within
a loop in which an operation in one iteration depends on the result
of a following operation that executes in a previous iteration.
|
|---|
| recursion | | An operation that is defined, at least in part,
by a repeated application of itself.
|
|---|
| recursive call | | A condition in which the sequence of instructions
in a procedure causes the procedure itself to be invoked again.
Such a procedure must be compiled for reentrancy.
|
|---|
| reduced instruction set computer (RISC) | | An architectural concept that applies to the definition
of the instruction set of a processor. A RISC instruction set is
an orthogonal instruction set that is easy to decode in hardware
and for which a compiler can generate highly optimized code. The
PA-RISC processor used in V2250 servers employ a RISC architecture.
|
|---|
| reduction | | An arithmetic operation that performs a transformation
on an array to produce a scalar result.
|
|---|
| reentrancy | | The ability of a program unit to be executed by
multiple threads at the same time. Each invocation maintains a private
copy of its local data and a private stack to store compiler-generated
temporary variables. Procedures must be compiled for reentrancy
in order to be invoked in parallel or to be used for recursive calls.
HP compilers compile for reentrancy by default.
|
|---|
| reference | | Any operation that requires a cache line to be encached;
this includes load as well as store operations, because writing
to any element in a cache line requires the entire cache line to
be encached.
|
|---|
| register | | A hardware entity that contains an address, operand,
or instruction status information.
|
|---|
| reuse, data | | In the context of a loop, the ability to use data
fetched for one loop operation in another operation. In the context
of a cache, reusing data that was encached for a previous operation;
because data is fetched as part of a cache line, if any of the other
items in the cache line are used before the line is flushed to memory,
reuse has occurred.
|
|---|
| reuse, spatial | | Reusing data that resides in the cache as a result
of the fetching of another piece of data from memory. Typically,
this involves using array elements that are contiguous to (and therefore
part of the cache line of) an element that has already been used,
and therefore is already encached.
|
|---|
| reuse, temporal | | Reusing a data item that has been used previously.
|
|---|
| RISC | | Reduced instruction set computer. An architectural
concept that applies to the definition of the instruction set of
a processor. A RISC instruction set is an orthogonal instruction
set that is easy to decode in hardware and for which a compiler
can generate highly optimized code. The PA-RISC processor
used in V2250 servers employs a RISC architecture.
|
|---|
| rounding | | A method of obtaining a representation of a number
that has less precision than the original in which the closest number
representable under the lower precision system is used.
|
|---|
| row-major order | | Memory representation of an array such that the
rows of an array are stored contiguously. For example, given a two-dimensional
array A[3][4], array element A[0][3]
immediately precedes A[1][0] in
memory. This is the default storage method for arrays in C.
|
|---|
S |
|---|
| scope | | The domain in which a variable is visible in source
code. The rules that determine scope are different for Fortran and
C/C++.
|
|---|
| semaphore | | An integer variable assigned one of two values:
one value to indicate that it is "locked," and
another to indicate that it is "free." Semaphores
can be used to synchronize parallel threads. Pthreads provides a
set of manipulation functions to facilitate this.
|
|---|
| shape | | The number of elements in each dimension of an array.
|
|---|
| shared virtual memory | | A memory architecture in which memory can be accessed
by all processors in the system. This architecture can also support
virtual memory.
|
|---|
| shell | | An interactive command interpreter that is the interface
between the user and the Unix operating system.
|
|---|
| SIMD (single instruction stream multiple
data stream) | | A computer architecture that performs one operation
on multiple sets of data. A processor (separate from the SMP array)
is used for the control logic, and the processors in the SMP array
perform the instruction on the data. Compare with MIMD
(multiple instruction stream multiple data stream).
|
|---|
| single | | A single-precision floating-point number stored
in 32 bits. See also double.
|
|---|
| SMP | | Symmetric multiprocessor. A multiprocessor computer
in which all the processors have equal access to all machine resources.
Symmetric multiprocessors have no manager or worker processors;
the operating system runs on any or all of the processors.
|
|---|
| socket | | An endpoint used for interprocess communication.
|
|---|
| socket pair | | Bidirectional pipes that enable application programs
to set up two-way communication between processes that share a common
ancestor.
|
|---|
| source code | | The uncompiled version of a program, written in
a high-level language such as Fortran or C.
|
|---|
| source file | | A file that contains program source code.
|
|---|
| space | | A contiguous range of virtual addresses within the
system-wide virtual address space. Spaces are 16 Tbytes in the V2250
servers.
|
|---|
| spatial reference | | An attribute of a memory reference pattern that
pertains to the likelihood of a subsequent memory reference address
being numerically close to a previously referenced address.
|
|---|
| spawn | | To activate existing threads.
|
|---|
| spawn context | | A parallel loop, task list, or region that initiates
the spawning of threads and defines the structure within which the
threads' spawn thread IDs are valid.
|
|---|
| spawn thread identifier (stid) | | A sequential integer identifier associated with
a particular thread that has been spawned. stids are only assigned
to spawned threads, and they are assigned within a spawn context;
therefore, duplicate stids may be present amongst the threads of
a program, but stids are always unique within the scope of their
spawn context. stids are assigned sequentially and run from 0 to
one less than the number of threads spawned in a particular spawn
context.
|
|---|
| SPMD | | Single program multiple data. A single program executing
simultaneously on several processors. This is usually taken to mean
that there is redundant execution of sequential scalar code on all
processors.
|
|---|
| stack | | A data structure in which the last item entered
is the first to be removed. Also referred to as last-in, first-out
(LIFO). HP-UX provides every thread with a stack which is used to
pass arguments to functions and subroutines and for local variable
storage.
|
|---|
| store | | An instruction used to move the contents of a register
to memory.
|
|---|
| strip length, parallel | | In strip-based parallelism, the amount by which
the induction variable of a parallel inner loop is advanced on each
iteration of the (conceptual) controlling outer loop.
|
|---|
| strip mining | | The transformation of a single loop into two nested
loops. Conceptually, this is how parallel loops are created by default.
A conceptual outer loop advances the initial value of the inner
loop's induction variable by the parallel strip length.
The parallel strip length is based on the trip count of the loop
and the amount of code in the loop body. Strip mining is also used
by the data localization optimization.
|
|---|
| subroutine | | A software module that can be invoked from anywhere
in a program.
|
|---|
| superscalar | | A class of RISC processors
that allow multiple instructions to be issued in each clock period.
|
|---|
| Symmetric Multiprocessor (SMP) | | A multiprocessor computer in which all the processors
have equal access to all machine resources. Symmetric multiprocessors
have no manager or worker processors; the operating system runs
on any or all of the processors.
|
|---|
| synchronization | | A method of coordinating the actions of multiple
threads so that operations occur in the right sequence. When manually
optimizing code, you can synchronize programs using compiler directives,
calls to library routines, or assembly-language instructions. You
do so, however, at the cost of additional overhead; synchronization
may cause at least one CPU to wait for another.
|
|---|
| system administrator (sysadmin) | | The person responsible for managing the administration
of a system.
|
|---|
| system manager | | The person responsible for the management and operation
of a computer system. Also called the system administrator and the
sysadmin.
|
|---|
T |
|---|
| Tbyte | | See terabyte (Tbyte).
|
|---|
| terabyte (Tbyte) | | 1099511627776 (240) bytes.
|
|---|
| term | | A constant or symbolic name that is part of an expression.
|
|---|
| thread | | An independent execution stream that is executed
by a CPU. One or more threads, each of which can execute on a different
CPU, make up each process. Memory, files, signals, and other process
attributes are generally shared among threads in a given process,
enabling the threads to cooperate in solving the common problem.
Threads are created and terminated by instructions that can be automatically
generated by HP compilers, inserted by adding compiler directives
to source code, or coded explicitly using library calls or assembly-language.
|
|---|
| thread create | | To activate existing threads.
|
|---|
| thread identifier | | An integer identifier associated with a particular
thread. See thread identifier, kernel (ktid) and
thread identifier, spawn (stid).
|
|---|
| thread identifier, kernel (ktid) | | A unique integer identifier (not necessarily sequential)
assigned when a thread is created.
|
|---|
| thread identifier, spawn (stid) | | A sequential integer identifier associated with
a particular thread that has been spawned. stids are only assigned
to spawned threads, and they are assigned within a spawn context;
therefore, duplicate stids may be present amongst the threads of
a program, but stids are always unique within the scope of their
spawn context. stids are assigned sequentially and run from 0 to
one less than the number of threads spawned in a particular spawn
context.
|
|---|
| thread-private memory | | Data that is accessible by a single thread only
(not shared among the threads constituting a process).
|
|---|
| TLB | | See translation lookaside buffer.
|
|---|
| translation lookaside buffer | | A hardware entity that contains information necessary
to translate a virtual memory reference to the corresponding physical
page and to validate memory accesses.
|
|---|
| trip count | | The number of iterations a loop executes.
|
|---|
U |
|---|
| unsigned | | A value that is always positive.
|
|---|
| user interface | | The portion of a computer program that processes
input entered by a human and provides output for human users.
|
|---|
| utility | | A software tool designed to perform a frequently
used support function.
|
|---|
V |
|---|
| vector | | An ordered list of items in a computer's
memory, contained within an array. A simple vector is defined as
having a starting address, a length, and a stride. An indirect address
vector is defined as having a relative base address and a vector
of values to be applied as offsets to the base.
|
|---|
| vector processor | | A processor whose instruction set includes instructions
that perform operations on a vector of data
(such as a row or column of an array) in an optimized fashion.
|
|---|
| virtual address | | The address by which programs access their data.
HP-UX maps this address to the appropriate physical memory address.
See also space.
|
|---|
| virtual aliases | | Two different virtual addresses that map to the
same physical memory address.
|
|---|
| virtual machine | | A collection of computing resources configured so
that a user or process can access any of the resources, regardless
of their physical location or operating system, from a single interface.
|
|---|
| virtual memory | | The memory space as seen by the program, which is
typically larger than the available physical memory. The virtual
memory of a V2250 server can be up to 16 Tbytes. The operating system
maps this virtual memory to a smaller set of physical memory, using
disk space to make up the difference if necessary. Also called logical
memory.
|
|---|
W |
|---|
| wall-clock time | | The chronological time an application requires to
complete its processing. If an application starts running at 1:00
p.m. and finishes at 5:00 a.m. the following morning, its wall-clock
time is sixteen hours. Compare with CPU time.
|
|---|
| word | | A contiguous group of bytes that make up a primitive
data operand and start on an addressable boundary. In V2250 servers
a word is four bytes (32 bits) in length. See
also doubleword.
|
|---|
| workstation | | A stand-alone computer that has its own processor,
memory, and possibly a disk drive and can typically sit on a user's
desk.
|
|---|
| write | | A memory operation in which a memory location is
updated with new data.
|
|---|
Z |
|---|
| zero | | In floating-point number representations, zero is
represented by the sign bit with a value of zero and the exponent
with a value of zero.
|
|---|