Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP-UX Systems: HP aC++ Release Notes > Chapter 1 HP aC++ Release Notes

New Features in Version A.03.33

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

New features in HP aC++ version A.03.33 are listed below:

OpemMP Standard Supported

This release introduces full support for version 1.0 of the OpenMP C and C++ Application Program Interface. This specification is available at http://www.openmp.org/specs.

To enable recognition of OpenMP pragmas, use the +Oopenmp command line option when invoking aCC. This option is effective at any optimization level.

NOTE: Currently +Onoparallel does not affect the OpenMP pragmas inthe source but still disables +Oautopar.

Because multithreading is involved, -mt must also be used with +Oopenmp. (Otherwise runtime aborts may occur, especially with -AA.)

OpenMP programs require the libomp and libcps runtime support libraries to be present on both the compilation and runtime systems. The compiler driver will automatically include them when linking.

These libraries are installed by applying the appropriate patches:

  • PHSS_25028 - for 11.x prior to 11.11

  • PHSS_25029 - for 11.11 and greater

It is recommended that you use the -N option when linking OpenMP programs to avoid exhausting memory when running with large numbers of threads.

For this first release of aCC containing OpenMP, some debugging position information for OpenMP constructs may not be accurate. In addition, symbols marked with the threadprivate pragma may not be visible to the debugger. To work around this limitation, use the __thread storage class specifier in the symbol declaration instead.

   #if defined(__HP_aCC) && !defined(__THREAD)
#define __THREAD __thread
#else
#define __THREAD
#endif

__THREAD int tprvt;
#pragma omp threadprivate(tprvt)

OpenMP also supported in aC++’s ANSI C mode (-Ae).

OpenMP Known Problems:

Initialization of firstprivate variables is erroneously done after calculation of the loop iteration count. As a result, loops with iteration counts that depend on the value of firstprivate variables will execute incorrectly. For example:

 int n = 100;
#pragma omp for firstprivate(n)
for (int i = 0; i < n; i++) {
// Loop executes an indeterminate number of times because
// private copy of n is not initialized prior to calculation
// of loop iteration count.
}

aCC_MAXERR to Control Maximum Number of Compiler Errors

The aCC_MAXERR environment variable allows you to set the maximum number of errors you want the compiler to report before it terminates compilation. The current default is 12, but you can set it to any number greater than zero.

The compiler may not be able to recover from all errors and still display:

445 Cannot recover from earlier errors

instead of

699 Error limit reached: halting compilation

For example, the following increases the maximum to 100 errors:

    $export aCC_MAXERR=100
$aCC -c buggy.c

Small Block Allocator for malloc

The aC++ runtime now automatically enables malloc’s Small Block Allocator (SBA) after the aCC runtime patch and libc patch appropriate for your system are installed. (See the Required Patches section above.) This improves heap performance. For more information see malloc(3) and mallopt(3) manpages.

The default values are:

M_MXFAST = 512 bytes

M_NLBLKS = 100

M_GRAIN = 8 bytes

If you want to change the defaults, the environment variable _M_SBA_OPTS can be used. The format is:

export _M_SBA_OPTS=<maxfast>:<numlblks>:<grain>

If your existing application is already calling mallopt, then mallopt will likely return an error because libCsup will have already called mallopt and allocated a small block by the time the application calls mallopt.

If the above defaults are acceptable or you are already using _M_SBA_OPTS then the error should just be ignored. If the defaults degrade performance, then either set _M_SBA_OPTS with the values used by the application or disable this new feature by using the following:

export _M_SBA_OPTS=0:0:0

Applications with latent memory leaks may fail. If the application allocates a block that is too small while SBA is disabled, the block may be padded such that a overrun of the end of the allocated block might not cause a failure. But with SBA enabled, the next contiguous bytes may have been used for control information and an overrun would corrupt the heap and cause various aborts.

Gather/Scatter Prefetch Pragma

A pragma is now supported to prefetch specified cache lines. The behavior of this pragma is similar to +Odataprefetch but the prefetch pragma can access specific elements in indexed arrays that are stored in cache. In addition, any valid lvalue can be used as an argument, but the intent of the pragma is to support array processing.

Syntax

#pragma prefetch <argument>

There can be only one argument per pragma. The compiler generates instructions to prefetch the cache lines starting from the address given in the argument. The array element values prefetched must be valid. Reading outside the boundaries of an array results in undefined behavior at runtime.

Example

The function below will prefetch ia and b, but not a[ia[i]] when compiled with the command +O2 +Odataprefetch +DA2.0 (or +DA2.0W).

void testprefc2(int n, double *a, int *ia, double *b)
{
for (int i=0; i<n, i++) {
b[i]=a[ia[i]];
}
}

Recording this routine as:

#define USER_SPECIFIED 30
void testprefc2(int n, double *a, int *ia, double *b)
{
int dist=(int)USER_SPECIFIED;
int nend=max(0,n_dist); /* so as not to read past the end of ia */
for(i=0;i<nend;i++) /* original loop is for (i=0;i<n;i++)*/
{
#pragma prefetch ia[i+4*dist]
#pragma prefetch a[ia[i+dist]]
b[i]=a[ia[i]];
}
/* finish up last part with no prefetching */

for (int i=nend;i<n;i++)
b[i]=a[ia[i]];
}

The two pragma statements allow a[ia[i]] to be prefetched. Note that the compiler continues to unroll the loops as in the original code.

There can be problems using the prefetch pragma when the kernel cannot allocate large pages. Without large pages, there can be performance lost to Translation Lookaside Faults (TLB). The optimal page size varies with different applications but 4MB page size isa good average.

TLB faults occur when a particular page address does not reside in the TLB buffer. This buffer contains the mapping of the virtual addresses to the absolute addresses of the pages recently fetched in the cache. A TLB fault happens when a reference to a particular virtual page address cannot be translated to an absolute address in the buffer.

Even when all the TLB and prefetch features are working, you are still limited by the memory bandwidth of the system. The top bandwidth may be reduced by failing to load all the memory slots in some PA-RISC systems. The memory controller depends on having all slots loaded to get the best bank interleaving.

Support for SDK/XDK

The SDK/XDK feature helps in selecting components, headerfiles, and libraries installed in alternate locations. You must set either one or both of the following environment variables:

  • SDKROOT

  • TARGETROOT

SDKROOT Environment Variable

The SDKROOT environment variable is used as a prefix for all references to tool set components and must be set when you use a non-native development kit or a toolset installed at an alternative location. Some of the toolset components are compiler drivers, Compiler Applications, Preprocessor, Linker, and object file tools.

For example, if a compiler tool set is installed in directory /opt/xdk-ia/ then, export SDKROOT=/opt/xdk-ia prefixes all references to the compiler tool set components with /opt/xdk-ia.

The following details the default tool set components location as specified in the above command and its earlier location before the execution of the command:

Native LocationAlternate Toolset Location
/opt/aCC/bin/aCC/opt/xdk-ia/opt/aCC/bin/aCC
/opt/aCC/lbin/ctcom/opt/xdk-ia/opt/aCC/lbin/ctcom
/opt/langtools/lbin/ucomp/opt/xdk-ia/opt/langtools/lbin/ucomp

Invoking the compiler driver aCC results in the invocation of tool set components from the alternate location above.

If the compiler is non-native and installed in a different place, the directory path can be prefixed for all references to the compiler. You have to set the environment variables XDKROOT or SDKROOT to point to that directory.

For example, if the compiler is installed in directory /user/foo, then the command export SDKROOT=/user/foo prefixes all references to the compiler with /user/foo.

Default PathNew Path for Compiler
/opt/aCC/bin/aCC/user/foo/opt/aCC/bin/aCC
/opt/aCC/lbin/ctcom/user/foo/opt/aCC/lbin/ctcom
/opt/langtools/lbin/ucomp/user/foo/opt/langtools/lbin/ucomp

For information on SDK usage scenarios, see HP-UX Software Development Kit User’s Guide on http://devresource.hp.com.

TARGETROOT Environment Variable

The TARGETROOT environment variable is used as a prefix for all references to target set components and must also be set when using a non-native development kit. Some of the target set components are header files, archive libraries, and shared libraries.

For example, if a target tool set is installed in directory /opt/xdk-ia/ then, export TARGETROOT=/opt/xdk-ia prefixes all references to the target tool set components with /opt/xdk-ia.

The following details the default tool set components location as specified in the above command and its earlier location before the execution of the command:

Native LocationAlternate Tool Set Location
/usr/include/opt/xdk-ia/usr/include
/opt/aCC/include*/opt/xdk-ia/opt/aCC/include*
/usr/lib/opt/xdk-ia/usr/lib
/opt/aCC/lib/opt/xdk-ia/opt/aCC/lib

Environment variables like LPATH and options like -l or -L override $TARGETROOT prefixing.

Support for _declspec

This release supports __declspec(dllimport) and __declspec(dllexport) as keywords. These keywords have the same semantics as in Microsoft Windows compilers and ease porting of applications developed in Microsoft Windows compilers to HP-UX systems.

Support of these keywords enhances the performance of shared libraries and relieves the usage of HP_DEFINED_EXTERNAL pragmas to hide the non-exported symbols.

Syntax and Semantics

__declspec ( extended-attribute ) declarator
extended-attribute:
dllimport
| dllexport
  1. Declaring a symbol with external linkage as __declspec(dllexport) tells the compiler that the symbol should be exported from the current load module (shared library) and made visible to other load modules.

  2. Declaring a symbol with external linkage as __declspec(dllimport) tells the compiler that the symbol is defined in a shared library and is outside the current load module.

  3. Declaring a class with either the __declspec(dllexport) or __declspec(dllimport) keyword results in all its member functions and static data members being marked for export or import

  4. Only symbols having external linkage can be declared using these keywords.

  5. It is legal to selectively specify members of a class as dllexport or dllimport but selective specification is not allowed if the class itself is exported or imported.

-Bhidden and -Bhidden_def Command Line Options

The current behavior of the aC++ compiler on HP-UX systems is to export all symbols with external linkage by default. In order to facilitate exporting only those symbols marked with __declspec(dllexport) and hide the rest, use the following two options to hide the symbols by default.

-Bhidden Command Line Option

This option hides of all the symbols used in the translation unit other than the ones prefixed with __declspec(dllexport), __declspec(dllimport), or specified with pragma HP_DEFINED_EXTERNAL.

-Bhidden_def Command Line Option

This option hides all the symbols defined in the translation unit other than the ones prefixed with __declspec(dllexport) or specified with pragma HP_DEFINED_EXTERNAL.

Since all the functions marked as hidden (both defined and referenced in the translation unit) are expected to be defined in the same load module, the compiler can optimize the calls to those functions by generating direct calls. But this requires you to notify the compiler about the functions not defined in the same load module and ask it to generate indirect calls to them through the PLT. This can be done using the with pragma HP_DEFINED_EXTERNAL. You have the option of choosing either of the following:

  1. Hide the symbols defined and optimize calls to functions not defined in the current translation unit (other than the ones specified using pragma HP_DEFINED_EXTERNAL).

  2. Hide the symbols defined, but not optimize calls to functions not defined in the current translation unit In this case, you do not have to worry about HP_DEFINED_EXTERNAL.

NOTE:
  1. To be able to use these features, the following linker patches need to be installed:

    • PHSS_24303 (for HP-UX 11.00)

    • PHSS_24304 (for HP-UX 11.11)

  2. main function is always exported.

  3. Compiler generated vtables and typeids are always exported.

  4. The compiler defines macro __HP_WINDLL whenever -Bhidden or -Bhidden_def options are used. This macro can be used for conditional compilation. For example,

    #ifdef __HP_WINDLL
      #define DLLEXPORT __declspec(dllexport)
      #define DLLIMPORT __declspec(dllimport)
    #else
      #define DLLEXPORT
      #define DLLIMPORT
    #endif
  5. If an inline member function of a class is called from outside the shared library where the class is defined and that function happens to reference another member of the same class, you should make sure that the referenced member also is exported. Otherwise the linker will fail to resolve.

  6. Ensure that the virtual member functions of a class defined in a shared library are exported as needed. If virtual member functions are not exported and another class derives from this class but does not override these virtual functions, then there will be link-time errors. See example 8 below for an example scenario.

Examples

  1. In the following program, global variable glob is imported:

     class Hello
    {
    public:
    int x;
    };

    __declspec(dllimport) extern Hello glob;

    int main() { glob.x = 10; return 0;}
  2. In the following program, symbols export_me and export_me_func() will be exported; the rest of the symbols are hidden:

     __declspec(dllexport) int export_me;
    int iam_hidden;
    __declspec(dllexport) int export_me_func() { }
    void iam_hidden_func() { }
  3. In the following program, class ImportME is imported from outside the current load module:

     class __declspec(dllimport) ImportME
    {
    public:
    void print();
    };
  4. In the following program, only member function mem() is exported:

     class Test
    {
    public:
    __declspec(dllexport) mem();
    goo();
    };
  5. In the following program, exporting symbols with internal linkage is illegal:

     __declspec(dllexport) static int static_int; //illegal
    int main()
    {
    __declspec(dllexport) int local_export; //illegal
    return 0;
    }
  6. In the following program, importing defined symbols is illegal:

     __declspec(dllimport) int func() { } //illegal
  7. In the following program, selectively exporting a member function when the class itself is imported is illegal:

     class __declspec(dllimport) Employee
    {
    __declspec(dllexport) void mem(); // illegal
    };
  8. In the following example, there are 2 source files: dll.C and caller.C. dll.C is used to build the shared library containing the definition for class BaseClass. In caller.C, class DerivedClass derives from BaseClass. Virtual member function foo() is overridden by DerivedClass. But the other virtual member function goo() is not. Since goo() is not exported from the shared library (dll.sl), the linker gives an unresolved symbol error. You should be careful that all such functions are exported from the shared library.

     //dll.h
    class BaseClass
    {
    public:
    BaseClass() { }
    virtual void foo();
    virtual void goo(); // should be exported as it is needed
    // in the derived class which does not
    }; // override it
    //end of dll.h


    //dll.C
    #include “dll.h”

    void BaseClass::foo() { }
    void BaseClass::goo() { }

    // end of dll.C


    //caller.C
    #include “dll.h”

    class DerivedClass : public BaseClass
    {
    public:
    void foo() { }
    };

    BaseClass *p;

    int main()
    {
    p = new DerivedClass;
    p->foo();
    }
    // end of caller.C

    $ aCC -Bhidden_def -o dll.sl dll.C +z -b
    $ aCC caller.C dll.sl
    /usr/ccs/bin/ld: Unsatisfied symbols:
    BaseClass::goo() (first referenced in caller.o) (code)

    The solution is to export member function goo() from the shared library using __declspec(dllexport).

+Oprofile Option for Profile-Based Optimization

This release enhances the usability of PBO by providing the flexibility of choosing to generate the PA-RISC machine code (SOMs) directly instead of the compiler’s intermediate code (ISOMs) during the compilation phase itself. Behavior of the earlier versions of the compiler has been to generate intermediate code during compilation phase when PBO options are used (+I,+P), and generate final PA-RISC machine code during link-phase. As a result of this behavior, when a large number of files are compiled with PBO options, code generation for all the files would happen during link phase.An obvious disadvantage of this is that, even when a single file is changed, code generation for all other files will happen during link-phase, unless +Oreusedir= is used. This makes overall compile-link time significantly high. With the new set of +Oprofile options, this disadvantage can be overcome.

The options below do not produce ISOMs (Intermediate-code .o files) as do the +I, +P, and +O4 options. Therefore they will rebuild faster than the ISOM-building options, but cannot just be relinked in the +P phase from the ISOMs built by the +I phase. The new options also do not support cross-module optimization with the +O4 option. PBO build processes that do not rebuild from source will not work with these new options, but processes that currently use scripts to run ld -r commands on every ISOM to convert it to a SOM can use the aCC driver with these new options instead of the scripts.

Following are the new options of +Oprofile:

+Oprofile=use

Use the profile database to optimize. This is similar in behavior to the +P option.

+Oprofile=use:filename

Specify filename as the name of the profile database file. This is similar in behavior to the +P and +df options (that is, +P +df filename).

+Oprofile=collect

Instrument the application for profile based optimization. This is similar in behavior to the +I option.

+Oprofile=prediction:static

Select static branch prediction for this executable. This is a synonym for +Ostaticprediction.

NOTE: Note the following while performing optimization using new options:
  • The new options can be used only with -c (compile only), if not, the optimization is performed as in the earlier versions of the compiler.

  • The new options are available only at optimization levels below +O4. At +O4, the compiler silently replaces +Oprofile by +I or +P.

  • Mixing of old and new options while optimizing on the same command line is disabled. For example, +Oprofile and +I/+P/+df in the same command line are incompatible.

  • The flow.data file must exist when compiling with +Oprofile=use, instead of the link stage with +P.

Initialized Thread Local Storage

Static linktime initialization of thread private variables (PODs only) is now supported. Earlier versions of the compiler supported only uninitialized thread private variables.

For example:

 __thread int j = 2; // allowed with this release
int main()
j = 20;
}

Since thread private memory is allocated during runtime, virtual addresses of the thread private variables should not be used in situations where compile time evaluation of the addresses is necessary. Following are some of the sample incorrect usages:

Example 1:

  __thread int tpv_1;
__thread int *ptr = &tpv_1; //incorrect

Example 2:

__thread int tpv_1;
int *ptr = &tpv_1; //incorrect

+O[no]inline=list Option

The list form is now available. It can contain the names of extern C functions or they must be mangled names.

-I- Option Enhanced to Perform prefixinclude Search

The -I- option has been enhanced to do a prefixinclude search. The -I- option, by itself, is not sufficient to handle the case involving a quoted include from a parent file which is not directly on the quoted or bracketed search paths. Prefixinclude search provides additional support for the case where, due to use of directory prefixes in #include directives in parent including files, the directory of the including file is no longer directly on the include search list.

In the non -I- case, use of directory prefixes in parent #include directives causes the compiler to look in some directory offset from the directory of the top-level source file. Analogously, in the -I- case, use of directory prefixes in parent include files in effect define an offset relative to the directories on the search list. This is equivalent to explicitly specifying the directory prefix explicitly in the child #include “...” directive. In fact, modifying the source #include directive in this way would allow the intended included file to be found without requiring prefixinclude support in the preprocessor.

Here’s an example of the problem:

 $ ls
a.c incl/ mk
$ ls incl
f.h x.h y.h

$ cat a.c
#include “incl/f.h”

$ cat incl/f.h
#include “incl/y.h”
#include “x.h”

$ cat incl/x.h
int x;

$ cat incl/y.h
int y;

$ aCC -c -I. a.c
$ # previous versions of aC++
$ aCC -c -I. -I- -I. a.c
“./incl/f.h”, line 2: Error: Could not open include file “x.h”.
NOTE: Note that a.c compiles fine with -I. but with -I. -I- -I. it fails to find x.h in -I..

With the prefixinclude feature in effect, the subdirectory prefix (in this case incl) is inherited from the including file for #include “...” style includes. So, if an including file was included as “prefix/includer” or <prefix/includer> then a file “includee” included by “prefix/includer” is first searched for using “prefix/includee”, and if that fails, is next searched for using “includee”. Using each of appropriate -I paths.

Searches for #include <...> files are not affected by prefixinclude, only #include “...” file searches have been enhanced.

Improved Optimization for HP_LONG_RETURN and +DA1.1

The code for HP_LONG_RETURN and +DA1.1 has been optimized when +Oentrysched is used. (Code for non-static member functions always turns on HP_LONG_RETURN.) Note that +Oentrysched may cause problems when using +eh, so is only recommended if using +noeh.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 2003 Hewlett-Packard Development Company, L.P.