Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP-UX Workload Manager A.03.02.xx Release Notes for HP-UX 11i v1, HP-UX 11i v2, and HP-UX 11i v3: > Chapter 1 HP-UX Workload ManagerRelease Notes

Known problems and workarounds

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

This section discusses problems and workarounds.

System panic when PRM is enabled; install failure in absence of PRM when certain kernel patches are present

Issue

On HP-UX 11i v1, a system panic problem occurs when Process Resource Manager (PRM) is enabled. In addition, WLM installation fails when certain core kernel patches are present and PRM is absent.

Workaround

Install the version A.01.00.00.07 or later PROCSETS product bundle, which includes critical core kernel and processor set patches. It also installs patches PHKL_30032 through PHKL_30036 or their superseding patches. For patch descriptions, refer to Table 1-4 “HP-UX 11i v1 (B.11.11) patches for WLM”.

Capping issue

Issue

WLM maintains CPU allocations for workloads by capping their CPU access. Unfortunately, an algorithm in the CPU scheduler that WLM uses does not always preserve capping.

You may see symptoms of this issue in wlminfo output as in the portion of sample output shown below, with the g_nice group having its “CPU Util” value significantly higher than the “CPU shares” value:

# wlminfo group

Workload Group   PRMID  CPU Shares  CPU Util  Mem Shares  State
OTHERS 1 450.00 4.49 0.00 ON
g_nice 2 108.00 125.57 0.00 ON
g_nightly 3 0.00 0.00 0.00 OFF
g_team 4 6.00 0.00 0.00 ON

For a given group, if its “CPU Util” value is ever significantly greater than its “CPU Shares” value, your system is affected by this issue. (“CPU Util” values slightly above “CPU Shares” are normal.)

Workaround

On HP-UX 11i v1 (B.11.11), install patches PHKL_30034, PHKL_30035, PHKL_31993, PHKL_31995, and PHKL_32061. Install all these patches at the same time.

On HP-UX 11i v2 (B.11.23), install the BUNDLE11i patch bundle. Any version of this bundle is acceptable.

WLM uses only the assigned CPU resources even with utilitypri set

Issue

In an Instant Capacity (iCAP) environment, with utilitypri set in your WLM global arbiter configuration, WLM ensures all your owned cores are active. However, if Instant Capacity is not configured in the environment (no designated Instant Capacity cores), WLM uses only the cores that were assigned to virtual partitions when the WLM global arbiter (wlmpard) was started.

Workaround

Be sure to assign all the owned cores using vparmodify before you start wlmpard. If wlmpard is already running, stop it (with the -k option) and assign all the owned cores using vparmodify.

Temporary Instant Capacity (TiCAP) expires while WLM is managing nPartitions

Issue

WLM manages nPartitions using its wlmpard daemon. Assume wlmpard is started on a system that has Temporary Instant Capacity in use. If that temporary capacity expires, wlmpard will still be able to deactivate cores without any problems. However, wlmpard may attempt to activate cores based on the expired capacity. These attempts will fail because the temporary capacity no longer exists. wlmpard will not abort, but it may continue to attempt to activate unavailable cores, generating a message of the following form in /var/opt/wlm/msglog:

Error increasing core count on partition par_name (has x needs y).

You will also see the message:

Unable to set the local partition to z cores. Check the partition status.

where x, y, and z represent integer values.

Workaround

Add a utilitypri statement to your wlmpard configuration, say configuration_file, and then load the new file:

# /opt/wlm/bin/wlmpard -a configuration_file

The utilitypri keyword allows WLM—when Temporary Instant Capacity is available—to adjust the total cores to meet demand.

Specifying this priority ensures WLM maintains compliance with your Temporary Instant Capacity usage rights. When your prepaid amount of temporary capacity expires, WLM no longer attempts to use the temporary resources.

NOTE: Beginning with WLM A.03.02, you can set a threshold that determines when WLM will stop allocating temporary capacity resources. Prior to WLM A.03.02, the threshold was fixed at 15 processing days (where WLM stops allocating temporary capacity if 15 or fewer processing days of temporary capacity remain available). For more information, see “WLM Temporary Instant Capacity 15-day threshold too limiting” and wlmparconf(4).

Automatic activation of Instant Capacity core without authorization

Issue

An Instant Capacity (iCAP) core was automatically activated without customer authorization.

Workaround

Please contact your HP representative.

If you have Instant Capacity or Pay per use (PPU) software installed, either:

  • Do not use WLM virtual partition management, or

  • Use vPars version A.03.01 or later

Application hangs in FSS group

Issue

On HP-UX 11i v2 (B.11.23), an application inside a workload group based on an FSS group may hang when running in a single-processor virtual partition, nPartition, or system.

Workaround

Install patch PHKL_33052.

Shutdown slow; “Waiting for shutdown confirmation” and “Shutdown initiated; however, ... unable to acquire confirmation” messages displayed

Issue

In some situations, WLM might take longer than expected to shut down, especially when WLM is in the process of modifying the distribution of CPU resources among partitions. In such cases, the shutdown request will not be honored until the modifications are completed. Beginning with WLM A.03.02, you might see the following message after 30 seconds:

Waiting for shutdown confirmation...

Then, if no shutdown confirmation is received within the next 90 seconds, WLM will display the following message:

Shutdown initiated; however, we were unable to acquire confirmation. Check the messages in /var/opt/wlm/msglog for more details.

In versions of WLM prior to A.03.02, under similar circumstances you would get one of the following messages after 30 seconds:

wlmd -k failed: Resource temporarily unavailable

wlmpard -k failed: Resource temporarily unavailable

wlmcomd -k failed: Resource temporarily unavailable

These messages are misleading in that they imply that the shutdown request had failed when it might not have. These messages have been replaced by the more accurate messages reported above.

Workaround

If a shutdown request has been delayed for 30 seconds, WLM issues the “Waiting for shutdown confirmation” message. WLM is likely delaying the shutdown request while waiting for partition modifications to complete. If after a total of 120 seconds the shutdown has still not completed, WLM issues the “Shutdown initiated” message. This most likely means that the partition modifications have not yet completed. They can take longer than 120 seconds. When the modifications have been made, WLM will honor the shutdown request. You can verify the shutdown has succeeded by using the ps command (if necessary, issue your shutdown command again). In addition, check the messages in /var/opt/wlm/msglog.

Unable to get CPU allocation due to number of processes

Issue

WLM provides a workload group its allocation on a system by granting the group its allocation on each core. If the group does not have at least one process for each core, WLM increases the allocations for the processes to compensate. For example, for a workload group with a single-threaded process, 10% of four cores is allocated as 40% of one core.

Assume this same group were allocated 50% of the four cores. WLM would allocate 100% of two cores to the workload group. However, because the group has only the one thread, it can use only one core, resulting in an allocation of 25%.

Workaround

There is no workaround. However, be aware of how your applications run so that you do not give them resource allocations they cannot use.

Collectors abort when updated while running

Issue

If you update (overwrite) a data collector executable (be it a binary or a script) while it is providing data to WLM, the collector may abort.

Workaround

There are two workarounds to this issue:

  • Update the data collector in place

    1. Stop WLM (wlmd -k)

    2. Update the data collector

    3. Re-start WLM (wlmd -a configuration_file)

  • Replace the data collector

    1. Move the current data collector aside

    2. Install the new data collector in place of the collector you just moved aside

    3. Re-start WLM (wlmd -a configuration_file)

GlancePlus/OpenView Performance Agent and processor sets

Issue

On systems with multiple processor sets configured, GlancePlus may have incorrect data for the PRM_SYS group (ID 0). On these systems, GlancePlus will incorrectly include processes that are outside of the default processor set as belonging to the PRM_SYS group (ID 0).

As a result, the WLM glance_prm data collection script cannot be used to track application (APP or APP_PRM) metrics for the PRM_SYS group or any PRM group defined based on a PSET.

Only GlancePlus is affected by this issue—WLM properly identifies workload groups (PRM groups and their PRM IDs).

Workaround

For HP-UX 11i v1 (B.11.11), using GlancePlus C.03.55 or later and installing patch PHKL_28052 addresses this issue.
For HP-UX 11i v2 (B.11.23), this issue is fixed in GlancePlus C.03.58.05.

GlancePlus may not correctly identify processes’ PRM groups

Issue

On some systems, GlancePlus would not correctly identify processes’ PRM groups. WLM uses these PRM groups as its workload groups. On these systems, GlancePlus would report all processes as belonging to the PRM_SYS group (ID 0). As a result, the WLM glance_prm data collection script could not be used to track application (APP) metrics for a PRM group of processes.

Workaround

This issue is fixed in GlancePlus C.03.35.00. On HP-UX 11i v1, the best way to get this upgrade is to install GlancePlus when installing the 11i Enterprise or 11i Mission-critical Operating Environments.

glance Adviser memory consumption increases continually

Issue

GlancePlus’s glance Adviser leaks memory when running continuously. Adviser is used by the WLM data collectors glance_app, glance_gbl, glance_prm, glance_prm_byvg, and glance_tt.

Workaround

This memory leak is fixed in GlancePlus C.03.35.00. On HP-UX 11i v1, the best way to get this upgrade is to install GlancePlus when installing the 11i Enterprise or 11i Mission-critical Operating Environments.

WLM enables/disables SLOs at end of interval

Issue

WLM enables and disables time-based SLOs only at the end of an interval. This interval is 60 seconds by default and can be changed with the wlm_interval keyword in your configuration file.

SLOs are time-based when you set their condition or exception keyword values in the configuration file.

If your interval is too long, an SLO may not be enabled as indicated in the configuration. For example, assume the interval is 1800 seconds (30 minutes). Also assume that one SLO is supposed to have the entire system to itself for a short period of time, with WLM enabling that one SLO for 20 minutes while disabling all other SLOs for the same 20 minutes. If these 20 minutes begin at 3pm everyday, but the interval ends at 3:15pm, your configuration does not actually go into effect until 3:15pm. Moreover, it is not changed again until 3:45pm.

Workaround

Be aware of how your interval and time-based SLOs interact and adjust them accordingly.

No metrics on startup or reconfiguration

Issue

Metrics have no value on WLM startup or reconfiguration. WLM cannot work toward any SLOs without metrics.

Workaround

Data collectors should report metrics as soon after startup or reconfiguration as possible.

WLM configurations cannot be activated with fewer than 100 Mbytes of memory available

Issue

When controlling memory, WLM allocates at least a minimum amount to each group. If extended_shares is enabled, this minimum is 0.2% of the available memory; otherwise, it is 1%. (Available memory is the amount reported by prmavail; it is the amount that is not reserved for the kernel (/stand/vmunix) and its data structures and for nonkernel system processes. Thus, available memory is not the total memory on the system. Available memory varies over time. For more information, see the Process Resource Manager User’s Guide, available in /opt/prm/doc.) WLM requires that this minimum represent at least 1 Mbyte of memory. Thus, when memory control is used, the system should have at least 100 Mbytes of available memory, or at least 500 Mbytes if extended_shares is enabled.

Workaround

Increase the system’s memory or decrease the minimum dynamic buffer cache using the kernel parameter dbc_min_pct.

Secure Resource Partitions:
Blocked port on a virtual network interface

Issue

Using the HP-UX feature Security Containment, you can set up a virtual network interface for each secure compartment. A process in one secure compartment can bind to a socket on a virtual network interface associated with a different secure compartment. Although this process will not be able to accept connections or use the socket to send or receive data, it does prevent other processes from binding to that socket.

Workaround

Be sure your applications that access the network bind only to sockets on the virtual network interface created for their respective secure compartments. Fore more information, see the following documents:

For more information, refer to the Security Containment release notes and the HP-UX 11i Security Containment Administrator’s Guide.

Reaching the system V semaphore limit

Issue

If your system has many system V semaphores in use, WLM usage of semaphores may push the total number of semaphores over system limits—especially if the WLM configuration contains a large number of data collectors. The WLM daemon wlmd creates two semaphore sets: one containing a single semaphore; the other containing a semaphore for each data collector in the WLM configuration.

If this system limit is reached, wlmd prints the following error and exits before the first WLM interval begins:

Cannot allocate a system V semaphore set of size x: Increase the system-imposed limits.

You may also see the following error:

WLM--”wm_knob_init, prm_rep_load”: PRM--”PRM internal daemon binary is missing or has incorrect permissions (PRM-2352)”

For additional details relating to this message, look in syslog. In some conditions, this message indicates no semaphores are available.

Workaround

Use the SAM (/usr/sbin/sam), SMH (/usr/sbin/smh), or kcweb (/usr/sbin/kcweb, on HP-UX 11i v2 or later) kernel configuration utility to increase the system limits.

On HP-UX 11i v1 and later, increase the following kernel parameters:

semmns

Max number of overall semaphores

semmsl

Max number of semaphores allowed in a semaphore set

Configuration wizard requires PRM

Issue

Starting with the WLM A.03.01 release, WLM no longer includes Process Resource Manager (B3835DA). The WLM configuration wizard requires PRM however. Without PRM installed, the wizard:

  • Always sets the initial CPU (core) count to 1 in the pop-up dialog that appears before the wizard itself appears

  • Returns a message about the PRM API not being installed when it attempts to validate a configuration

Workaround

Install PRM A.03.00 or later.

Processes in transient FSS groups appear unexpectedly in other workload groups

Issue

A deployed WLM configuration that has transient_groups set to 1 and contains FSS transient group candidates from time-to-time contains an FSS group called _IDLE_. As needed, WLM moves the jobs of the transient groups to _IDLE_, where they get the minimum of CPU and memory resources. The internal identifier for group _IDLE_ is picked by WLM on the fly (taken from the pool of unused identifiers).

On a redeployment, if the new configuration contains an FSS group that happens to have the identifier WLM selected for the _IDLE_ group in the deployment that is being replaced, jobs in the _IDLE_ group will migrate to the FSS group having the same identifier.

The same issue can arise going from:

  • A transient deployment to another transient deployment

  • A transient deployment to a non-transient deployment

Workaround

You can prevent this situation by using the -i option to wlmd when deploying or redeploying a configuration. First, shut down wlmd (using the -k option), then restart it using the -i option.

Before modifying any partition managed by WLM, WLM and the global arbiter must be stopped

Issue

Do not adjust any WLM-managed partition while wlmpard is running. This includes using vparmodify, icapmodify, or icod_modify to change the name, configuration, or resources (CPU and memory) associated with the virtual partition or nPartition (and this also includes using parolrad to modify a cell in a WLM-managed partition, as noted in “Before performing online cell operations on systems where WLM manages partitions, memory, or PSETs, WLM must be stopped”).

Workaround

To adjust a partition, you must first shut down WLM—including wlmpard—on all partitions that will be affected by the modification, modify the partition, and then restart WLM. Changes to Instant Capacity (iCAP) affect the entire complex; changes to a virtual partition affect the nPartition only, unless Instant Capacity is configured on the nPartition. For example, if WLM is managing two virtual partitions vParA and vParB, and you need to migrate memory resources from vParA to vParB, you must shut down WLM in both virtual partitions. As another example, to change the name of an nPartition, you must first shut down WLM in every operating system instance across the entire complex, because the name change affects Instant Capacity, and Instant Capacity changes affect every nPartition across the complex.

To stop WLM, stop the wlmpard and wlmd daemons (use the -k option with the corresponding commands).

Before performing online cell operations on systems where WLM manages partitions, memory, or PSETs, WLM must be stopped

Issue

If WLM is being used to manage memory records, partitions, or PSET-based workload groups, and you attempt to perform an online cell operation (parolrad) while WLM is running, changes made to CPU resources by the operation might not be detected by WLM and can cause problems for WLM management of CPU resources. Error messages will be generated.

Workaround

Before performing an online cell operation (parolrad) on a system where WLM is managing memory, partitions, or PSETs, you must first stop WLM, perform the operation, and then restart WLM. To stop WLM, stop the wlmpard and wlmd daemons (use the -k option with the corresponding commands). Note that wlmd should be stopped on all partitions managed by WLM.

You can check the status of online cell operations by using parolrad -m command.

WLM GUI is not compatible with different versions of WLM

Issue

If you attempt to use the WLM GUI (wlmgui) with a version of WLM that differs from the version the GUI is associated with, the following message is displayed:

The WLM product running on <hostname> and this tool are incompatible.

The version of the WLM GUI must match the version of the WLM product that it manages.

Workaround

Either upgrade WLM to the version of the WLM GUI you are using, or use an earlier version of the WLM GUI that matches the version of WLM that your GUI will manage. Note that multiple versions of the WLM GUI can be installed on a Microsoft Windows PC.

Upgrading or installing PRM before upgrading WLM from C.03.00 or earlier can cause WLM to fail swverify checks

Issue

If you install or upgrade to the latest version of PRM (C.03.02 or later) on a system with WLM A.03.00 or earlier, WLM will fail swverify checks.

Workaround

To ensure that WLM works properly on a system with PRM C.03.02 or later, upgrade WLM to A.03.02 or later. When upgrading WLM A.03.00 or earlier, upgrade WLM prior to upgrading PRM.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 2000-2007 Hewlett-Packard Development Company, L.P.