HP aC++ Version A.03.33
                                Release Notes


                                  Chapter 1:
                                  Features
_____________________________________________________________________________

This chapter summarizes the features included in this version of the HP aC++
compiler. Features introduced in prior release versions are also listed and
grouped by the compiler version number.

The compiler supports much of the ISO/IEC 14882 Standard for the C++
Programming Language (the international standard for C++).

HP aC++ provides a variety of performance related options, in addition to
the options described in these release notes.  See the "HP aC++ Online
Programmer's Guide" "Performance" section for full documentation.  Chapter 3
of these release notes provides access instructions to the guide.

Required Patches
================
The following patches must be installed after installing A.03.33 in order
to enable all the new features.

For HP-UX 11.00:

        PHCO_24723 (libc)
        PHCO_23792 (libpthread)
        PHSS_24303 (linker)
        PHSS_24627 (aC++ runtime)
        PHSS_25028 libomp

For HP_UX 11.11:

        PHCO_24400 (libc)
        PHCO_23846 (libpthread)
        PHSS_24304 (linker)
        PHSS_24638 (aC++ runtime)
        PHSS_25029 libomp


New and Changed Features
========================
New features in HP aC++ version A.03.33 are listed below. They apply to
HP-UX 11.x operating systems.

   - OpenMP Standard supported

   - Changes to Small Block Allocator (SBA) for malloc

   - Gather/Scatter Prefetching pragma

   - Support for SDK/XDK (cross-compilation)

   - Support for _declspec

   - aCC_MAXERR to control maximum number of compiler errors

   - +Oprofile, option for Profile Based Optimization

   - Improved optimization for HP_LONG_RETURN and +DA1.1

   - Initialized Thread Local Storage

   - +O[no]inline=list

   - -I- option enhanced to perform prefixinclude search

Note the new features OpenMP, _declspec, prefetch pragma, -I-, and
initialized TLS are currently not available on the IPF (Itanium Processor
Family) compilers.

o OpenMP Standard supported

This release introduces full support for version 1.0 of the
"OpenMP C and C++ Application Program Interface".  This specification
is available at < http://www.openmp.org/specs > .

To enable recognition of OpenMP pragmas, use the "+Oopenmp" command line
option when invoking aCC.  This option is effective at any optimization
level.  Note: Currently +Onoparallel does not affect the OpenMP pragmas in
the source but still disables +Oautopar.

Because multithreading is involved, -mt must also be used with +Oopenmp.
(Otherwise runtime aborts may occur, especially with -AA.)

OpenMP programs require the libomp and libcps runtime support libraries
to be present on both the compilation and runtime system(s).  The
compiler driver will automatically include them when linking.
These libraries are installed by applying the appropriate patches:

        PHSS_25028 - for 11.x prior to 11.11
        PHSS_25029 - for 11.11 and greater

It is recommended that you use the -N option when linking OpenMP programs
to avoid exhausting memory when running with large numbers of threads.

For this first release of aCC containing OpenMP, some debugging position
information for OpenMP constructs may not be accurate.  In addition,
symbols marked with the "threadprivate" pragma may not be visible to the
debugger.  To work around this limitation, use the "__thread" storage
class specifier in the symbol declaration instead.

   #if defined(__HP_aCC) && !defined(__THREAD)
   #define __THREAD __thread
   #else
   #define __THREAD
   #endif

   __THREAD int tprvt;
   #pragma omp threadprivate(tprvt)

OpenMP also supported in aC++'s ANSI C mode (-Ae).

OpenMP Known Problems:

   Initialization of firstprivate variables is erroneously done after
   calculation of the loop iteration count.  As a result, loops with
   iteration counts that depend on the value of firstprivate variables will
   execute incorrectly.

   Example:

   int n = 100;
   #pragma omp for firstprivate(n)
   for (int i = 0; i < n; i++) {
      // Loop executes an indeterminate number of times because
      // private copy of n is not initialized prior to calculation
      // of loop iteration count.
   }

o aCC_MAXERR to control maximum number of compiler errors

The aCC_MAXERR environment variable allows you to set the maximum number
of errors you want the compiler to report before it terminates compilation.
The current default is 12, but you can set it to any number greater than zero.

The compiler may not be able to recover from all errors and still display:

	445 Cannot recover from earlier errors

instead of

	699 Error limit reached: halting compilation

Example:
The following increases the maximum to 100 errors.

    $export aCC_MAXERR=100
    $aCC -c buggy.c

o Small Block Allocator for malloc

The aC++ runtime now automatically enables malloc's Small
Block Allocator (SBA) after the aCC runtime patch and libc patch
appropriate for your system are installed. (See the Required
Patches section above.)  This improves heap performance.
See malloc(3) and mallopt(3).

The default values are:
M_MXFAST = 512 bytes
M_NLBLKS = 100
M_GRAIN = 8 bytes

If you want to change the defaults, the environment variable _M_SBA_OPTS
can be used.  The format is:
         export _M_SBA_OPTS=::

If your existing application is already calling mallopt, then mallopt
will likely return an error because libCsup will have already called
mallopt and allocated a small block by the time the application calls
mallopt.

If the above defaults are acceptable or you are already using
_M_SBA_OPTS then the error should just be ignored.  If the defaults
would degrade performance, then either set _M_SBA_OPTS with the values
used by the application or you can disable this new feature by using the
following:
         export _M_SBA_OPTS=0:0:0

Applications with latent memory leaks may fail.  If the application
allocates a block that is too small while SBA is disabled, the block
may be padded such that a overrun of the end of the allocated block
might not cause a failure.  But with SBA enabled, the next contiguous
bytes may have been used for control information and an overrun
would corrupt the heap and cause various aborts.

o Gather/Scatter Prefetch pragma

A pragma is now supported to prefetch specified cache lines.
The behavior of this pragma is similar to +Odataprefetch but the
prefetch pragma can access specific elements in indexed arrays
that are stored in cache.  In addition, any valid lvalue can be used
as an argument, but the intent of the pragma is to support array
processing.

Syntax:

#pragma prefetch 

There can be only one argument per pragma.  The compiler generates
instructions to prefetch the cache lines starting from the address
given in the argument. The array element values prefetched must
be valid.  Reading outside the boundaries of an array results
in undefined behavior at runtime.

Example:
The function below will prefetch ia and b, but not a[ia[i]] when
compiled with +O2 +Odataprefetch +DA2.0 (or +DA2.0W).

	void testprefc2(int n, double *a, int *ia, double *b)
	{
	for (int i=0; i