performance and tuning
This chapter looks at the main factors that affect the performance of your Java™ application on HP-UX systems. Your tuning needs will vary depending on type and configuration of system, application, and number of users. Additional performance information is provided at the Developer and Solution Partner portal.
Users will get the best performance with mid-range N class and low-end L class systems and PA-8500-based workstations. The recommended minimum system for running Java™ applications is a PA-RISC 2.0 system or a PA-RISC 1.1 system with a floating-point coprocessor.
Performance differs on 8-bit, 16-bit, and 24-bit display terminals. You will see the best performance on the 24-bit display because fewer conversions are required.
application design
In your application design, remember that performance is designed into your application. Don't let it be an area ignored until just prior to release.
We've found three recommendations useful for program design.
- Maximize thread lifetimes and minimize thread creation/destruction cycles. Your application can spend more time doing application specific work.
- Minimize contention for shared resources to eliminate long queues waiting for mutexes. Your application will be scalable, because it won't be serialized on one lock.
- Minimize creation of short-lived objects. Your application can spend more time working rather than garbage collecting.
Tools available to help you meet these goals:
- Always run with -verbosegc to monitor garbage collection and its impact on your application's performance.
- Always profile your application to identify performance bottlenecks. The troubleshooting section below discusses extended Java™ profiling, HPjmeter, and Glance profiling tools.
- Using JNI you can obtain the performance benefit of using native code (HP-UX ANSI C) from within your Java™ application.
- Disk I/O should be minimized. Don't do random I/O to read a file serially (Random Access File class). You should use buffered I/O. Buffered Data Input Stream and Buffered Data Output Stream will give you the best performance.
- Complex AWT graphics will slow down your performance.
troubleshooting performance problems
- Use the most current version of Java. Also make sure you have all the latest operating system patches installed.
- HP-UX 11.0 1.1.x Users Only: Users may be able to improve performance by using one of the following HP-specific java options:
- Never use -nojit unless you know a problem with the JIT (Just-In-Time) compiler exists. Currently, all software vendors run with the JIT enabled.
- Check for NFS problems in the setup. Look at the CLASSPATH to see if there are any non-existent directories or redundant ones that would cause a slow search.
- Tune the garbage collector. Run with the -verbosegc flag to instruct the JVM to print diagnostic messages to stderr when it garbage collects. An acceptable rate of garbage collection is application specific and should be adjusted after evaluating the time to collect, frequency of collections, and the number of compressions. For example for a GUI, a collection every 10 to 20 seconds may be okay as long as the total collection time is not too long (less than half a second). For an application with a large heap (like most middle-tier servers), you may want to minimize the interval between collections by increasing the size of the heap. The JVM gives you two controls (-mx and -ms) to indirectly influence the frequency and duration of collections and the amount of swap space used. The -ms option controls the size of the initial heap. The default value of -ms prior to release 1.1.7 was 1 MB. At release 1.1.7 and for subsequent releases, the default for -ms will be 4 MB.
The -mx option controls the maximum size of the heap. The default value of -mx is 16 MB.
Making the heap larger decreases the frequency of collections but makes each collection take longer. The rule of thumb is that there should be more than 50% free space after each collection. Most applications seem to perform best with 65-75% free space after a collection. With the -verbosegc flag on, the diagnostic message includes the amount of free space after each collection. For server applications where swap space and physical memory are large, it may be better to run with an even larger heap to increase the time between collections.
For applications that have the same amount of live data after each collection, you can generally set the initial and maximum heap sizes to be the same value. This way the collector never needs to grow the heap. The maximum value needs to be large enough to handle any increase in the amount of space taken by the maximum number of live objects in the heap. Setting the maximum heap size to be 3-4 times the space required for the estimated maximum number of live objects will put you in the 65-75% free range.
For applications that have a widely varying number of live objects (or have a different number of live objects on different runs), try to set the initial heap size a little below the space required for the average and the maximum value large enough to handle the largest data set. An example of this kind of application is the Java™ compiler. When compiling small files, it doesn't use much space. When compiling lots of files (or very large ones) it uses a lot of space. By keeping the initial heap size low, we don't reserve much more physical memory than we need when compiling small files. The trade-off is that compiling large files will require the collector to incur the overhead to grow the heap.
- Ascertain where time is being spent in the Java™ code.
For understanding where time is being spent in the Java™ code, you can use the -prof and, in JDK 1.1.8, the -eprof option. When you run with -prof or -eprof, the JVM will collect call counts, timing information, and call graph information. When the application exits (and it must exit normally), a file, java.eprof, is created. The output file is hard to read because java method names are very long. HP has a tool, HPjmeter which you can use to analyze the file. If you've collected the information using JDK 1.1.8, you'll have both wall clock times and CPU times for each method. Also, you can analyze the behavior of each thread individually. Using -eprof and HPjmeter makes identification of performance bottlenecks easy. You can also use HPjmeter to analyze your application's object creation profile and examine thread lifetimes.
Glance displays system performance characteristics for your running an application. Glance's system call information for the process is invaluable. If you see high call counts for sched-yield and k sleep, you have identified a monitor/lock contention problem in your application. Problems such as not using buffered I/O are exposed as an inordinate number of reads/writes. You can use tusc on 11.0 to get system call traces.
Note: You can get an unlocked 90-day evaluation copy of GlancePlus off any HP-UX Applications CD (sometimes called the ISU CD or DART CD). To get the official "locked" product, just order through your normal channels.
- You may want to adjust certain kernel parameters to fit your needs.
Parameters applicable to Java™ applications and internet servers include max_thread_proc, maxusers, npty, and other Mass-Storage Subsystem, Memory Swap Subsystem, Process Management Subsystem, and System V IPC Shared-Memory Subsystem parameters.
- Increase the size of TCP request buffer if many clients are trying to connect to the machine simultaneously:
tcp_conn_request_max
or
ndd_set /dev/tcp tcp_conn.request_max1024
additional system configuration information
sizing the inode cache
Correct sizing of the inode cache is important for good performance. Each inode contains information about a single file. Entries in the inode cache are shared by all users. Typical needs for inode cache range from 1000 to 12000 for commercial installations. The default value is typically a formula like:
ninode "((NPROC + 16 + MAXUSERS) + 32)";
Because demand for inodes varies depending on the nature of the system load, you need to check inode cache utilization once the system is fully utilized. You can do this by running sar.
Note that entries in the inode cache stay in the cache until the system removes them to make room for newer entries. To determine if the inode cache is large enough, check the sar columns iget/s, namei/s, and dirbk/s. The latter two columns indicate demand on the file system (for filename-to-inode translation and directory block scanning), while the first column (iget/s) tells how frequently you had to get an inode from the disk. If iget per second are frequently too high (greater than five, or so), you may need a larger inode cache. If namei/s is more than ten times greater than iget/s, you may benefit from a larger inode cache. Reconfigure and check again.
size of the buffer cache
A typical range for a buffer cache is 16 to 32 megabytes. An application which does little disk I/O can get by with less, while an application which does frequent disk accesses might need significantly more memory for the buffer cache. The default buffer cache size is ten percent of physical memory.
You can control the size of the buffer cache by adjusting the bufpages kernel configuration parameter. Each page takes four kilobytes. Thus, typical values for bufpages are:
bufpages 2000; /* 8 MB for light disk access */
bufpages 4000; /* 16 MB for medium disk access */
bufpages 8000; /* 32 MB for heavy disk access */
bufpages 16000; /* 64 MB for very heavy disk access */
This is another requirement which is difficult to predict. Therefore, you need to check the system's performance once the system is fully loaded. See the section Measuring System Performance for instructions on checking buffer cache size.
Increasing the size of the buffer cache can have a tremendous effect on the performance of disk bound applications, unless the application is using raw I/O (bypassing the file system). Relational databases typically use raw I/O. Note, however, that code files and flat files are accessed via the file system. Therefore, even when using a relational database, it is important to adjust the buffer cache.
memory and swap space utilization
Physical memory utilization and swap space needs are different from one application to another.
An application written in Java™ may require one to two megabytes of memory per user, and two to four megabytes of swap space per user. To these memory requirements, add another five to ten megabytes for the kernel itself, and eight to sixty megabytes for the buffer cache.
Thus, the minimum memory requirement for a forty user system might be 40 * (1 Mb per user) + (6 Mb for HP-UX) + (8 Mb buffer cache), or 54 megabytes. The maximum memory requirement for a forty user system might be 40 * (2 Mb per user) + (12 Mb for HP-UX) + (32 Mb buffer cache), or 124 megabytes. Clearly, the size of the buffer cache and the amount of physical memory required per user have a significant influence on system cost.
On a forty user system, where each user requires eight megabytes of swap space, you would need at least 40*8 or 320 megabytes of swap space. It is suggested that you allow additional swap space over this minimum requirements calculation for future users, background processes, "peak swap" needs (such as at process initiation) and unforeseen software needs. You need to calculate swap space even if you have enough memory to prevent swapping from ever happening. This is because the kernel preallocates swap space for each process before the process begins. If you run out of swap space, processes will fail, even if the system is not doing any swapping.
Note: Swap space should be no more than two to three times the size of physical memory.
file system configuration
It is recommended that different file systems be used for different data needs. For example, the /var/tmp and spool directories may fill up if an error occurs; by keeping each of these in their own file system, the rest of the application will be undisturbed.
File systems contain disk blocks, which are made up of disk fragments. The default sizes are 8K blocks and 1K fragments. Using eight kilobyte disk blocks and eight kilobyte fragments increases performance for access of large files. Therefore, it is recommended that each file system containing large files (such as a database) be created with the following command:
$ /etc/newfs -L -b8192 -f8192 -i65536 _raw_disk_device_ C2474S
Fill in the blanks with the appropriate raw disk address (such as /dev/rdsk/c0d0s8) and disk type (such as C2474S). The -i parameter specifies the number of bytes per inode. By setting this value higher (65536 in this example), you reduce the number of inodes in that file system. This is appropriate for a file system with large files.
For file systems that contain many small files (such as program source code) it is more efficient to use the newfs defaults: 8K blocks and 1K fragments. If using Oracle, make certain the block size is also 8 KB. The default is 4 KB.
Java™ and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. Hewlett-Packard is independent of Sun Microsystems, Inc.
|