 |
» |
|
|
 |
|  |  |
This section discusses problems and workarounds. System
panic when PRM is enabled; install failure in absence of PRM when
certain kernel patches are present |  |
- Issue
On HP-UX 11i v1, a system panic problem occurs when
Process Resource Manager (PRM) is enabled. In addition, WLM installation
fails when certain core kernel patches are present and PRM is absent. - Workaround
Install the version A.01.00.00.07 or later PROCSETS
product bundle, which includes critical core kernel and processor
set patches. It also installs patches PHKL_30032 through PHKL_30036
or their superseding patches. For patch descriptions, refer to Table 1-4 “HP-UX 11i v1 (B.11.11) patches for WLM”.
Capping
issue |  |
- Issue
WLM maintains CPU allocations for workloads by capping
their CPU access. Unfortunately, an algorithm in the CPU scheduler
that WLM uses does not always preserve capping. You may see symptoms of this issue in wlminfo output as in the portion of sample output shown
below, with the g_nice group having its “CPU Util” value significantly
higher than the “CPU shares” value: # wlminfo group Workload Group PRMID CPU Shares CPU Util Mem Shares State OTHERS 1 450.00 4.49 0.00 ON g_nice 2 108.00 125.57 0.00 ON g_nightly 3 0.00 0.00 0.00 OFF g_team 4 6.00 0.00 0.00 ON
|
For a given group, if its “CPU Util” value
is ever significantly greater than its “CPU Shares” value,
your system is affected by this issue. (“CPU Util” values
slightly above “CPU Shares” are normal.) - Workaround
On HP-UX 11i v1 (B.11.11), install patches PHKL_30034,
PHKL_30035, PHKL_31993, PHKL_31995, and PHKL_32061. Install all
these patches at the same time. On HP-UX 11i v2 (B.11.23), install the BUNDLE11i patch bundle.
Any version of this bundle is acceptable.
WLM uses
only the assigned CPU resources even with utilitypri set |  |
- Issue
In an Instant Capacity (iCAP)
environment, with utilitypri set in your WLM global arbiter configuration,
WLM ensures all your owned cores are active. However, if Instant
Capacity is not configured in the environment (no designated Instant
Capacity cores), WLM uses only the cores that were assigned to virtual
partitions when the WLM global arbiter (wlmpard) was started. - Workaround
Be sure to assign all the
owned cores using vparmodify before you start wlmpard. If wlmpard is already running, stop it (with the
-k option) and assign all the owned cores using
vparmodify.
Temporary
Instant Capacity (TiCAP) expires while WLM is managing nPartitions |  |
- Issue
WLM manages nPartitions using its wlmpard daemon. Assume wlmpard is started on a system that has Temporary Instant
Capacity in use. If that temporary capacity expires, wlmpard will still be able to deactivate cores without
any problems. However, wlmpard may attempt to activate cores based on the expired
capacity. These attempts will fail because the temporary capacity
no longer exists. wlmpard will not abort, but it may continue to attempt
to activate unavailable cores, generating a message of the following
form in /var/opt/wlm/msglog: Error increasing core count on partition par_name (has x needs y). You will also see the message: Unable to set the local partition to z cores. Check the partition status. where x, y, and z represent integer values. - Workaround
Add a utilitypri statement to your wlmpard configuration, say configuration_file, and then load the new file: # /opt/wlm/bin/wlmpard -a configuration_file The utilitypri keyword allows WLM—when Temporary Instant
Capacity is available—to adjust the total cores to meet
demand. Specifying this priority ensures WLM maintains compliance
with your Temporary Instant Capacity usage rights. When your prepaid
amount of temporary capacity expires, WLM no longer attempts to
use the temporary resources.  |  |  |  |  | NOTE: Beginning with WLM A.03.02, you can set a threshold
that determines when WLM will stop allocating temporary capacity
resources. Prior to WLM A.03.02, the threshold was fixed at 15 processing
days (where WLM stops allocating temporary capacity if 15 or fewer
processing days of temporary capacity remain available). For more
information, see “WLM Temporary
Instant Capacity 15-day threshold too limiting” and wlmparconf(4). |  |  |  |  |
Automatic
activation of Instant Capacity core without authorization |  |
- Issue
An Instant Capacity (iCAP) core was automatically
activated without customer authorization. - Workaround
Please contact your HP representative. If you have Instant Capacity or Pay per use (PPU) software
installed, either: Do not use WLM virtual partition management,
or Use vPars version A.03.01 or later
Application
hangs in FSS group |  |
- Issue
On HP-UX 11i v2 (B.11.23), an application inside
a workload group based on an FSS group may hang when running in
a single-processor virtual partition, nPartition, or system. - Workaround
Install patch PHKL_33052.
Shutdown
slow; “Waiting for shutdown confirmation” and “Shutdown
initiated; however, ... unable to acquire confirmation” messages
displayed |  |
- Issue
In some situations, WLM might take longer than expected
to shut down, especially when WLM is in the process of modifying
the distribution of CPU resources among partitions. In such cases,
the shutdown request will not be honored until the modifications
are completed. Beginning with WLM A.03.02, you might see the following
message after 30 seconds: Waiting for shutdown confirmation... Then, if no shutdown confirmation is received within the next
90 seconds, WLM will display the following message: Shutdown initiated; however, we were unable to acquire confirmation.
Check the messages in /var/opt/wlm/msglog for more details. In versions of WLM prior to A.03.02, under similar circumstances
you would get one of the following messages after 30 seconds: wlmd -k failed: Resource temporarily unavailable wlmpard -k failed: Resource temporarily unavailable wlmcomd -k failed: Resource temporarily unavailable These messages are misleading in that they imply that the
shutdown request had failed when it might not have. These messages
have been replaced by the more accurate messages reported above. - Workaround
If a shutdown request has
been delayed for 30 seconds, WLM issues the “Waiting for shutdown confirmation” message. WLM is likely delaying the shutdown
request while waiting for partition modifications to complete. If
after a total of 120 seconds the shutdown has still not completed,
WLM issues the “Shutdown initiated” message. This most likely means that the partition
modifications have not yet completed. They can take longer than 120
seconds. When the modifications have been made, WLM will honor the shutdown
request. You can verify the shutdown has succeeded by using the
ps command (if necessary, issue your shutdown command
again). In addition, check the messages in /var/opt/wlm/msglog.
Unable
to get CPU allocation due to number of processes |  |
- Issue
WLM provides a workload group
its allocation on a system by granting the group its allocation
on each core. If the group does not have at least one process for
each core, WLM increases the allocations for the processes to compensate.
For example, for a workload group with a single-threaded process,
10% of four cores is allocated as 40% of one core. Assume this same group were allocated 50% of the four cores.
WLM would allocate 100% of two cores to the workload group. However,
because the group has only the one thread, it can use only one core, resulting
in an allocation of 25%. - Workaround
There is no workaround. However,
be aware of how your applications run so that you do not give them
resource allocations they cannot use.
Collectors
abort when updated while running |  |
- Issue
If you update (overwrite) a data collector executable
(be it a binary or a script) while it is providing data to WLM,
the collector may abort. - Workaround
There are two workarounds to this issue: Update the data collector in place Stop WLM (wlmd -k) Update the data collector Re-start WLM (wlmd -a configuration_file)
Replace the data collector Move the current
data collector aside Install the new data collector in place of the collector
you just moved aside Re-start WLM (wlmd -a configuration_file)
GlancePlus/OpenView
Performance Agent and processor sets |  |
- Issue
On systems with multiple processor sets configured,
GlancePlus may have incorrect data for the PRM_SYS group (ID 0). On these systems, GlancePlus will
incorrectly include processes that are outside of the default processor set
as belonging to the PRM_SYS group (ID 0). As a result, the WLM glance_prm data collection script cannot be used to track
application (APP or APP_PRM) metrics for the PRM_SYS group or any PRM group defined based on a PSET. Only GlancePlus is affected by this issue—WLM properly
identifies workload groups (PRM groups and their PRM IDs). - Workaround
For HP-UX 11i v1 (B.11.11),
using GlancePlus C.03.55 or later and installing patch PHKL_28052
addresses this issue. For HP-UX 11i v2 (B.11.23), this
issue is fixed in GlancePlus C.03.58.05.
GlancePlus
may not correctly identify processes’ PRM groups |  |
- Issue
On some systems, GlancePlus would not correctly
identify processes’ PRM groups. WLM uses these PRM groups
as its workload groups. On these systems, GlancePlus would report
all processes as belonging to the PRM_SYS group (ID 0). As a result, the WLM glance_prm data collection script could not be used to track
application (APP) metrics for a PRM group of processes. - Workaround
This issue is fixed in GlancePlus C.03.35.00. On
HP-UX 11i v1, the best way to get this upgrade is to install GlancePlus
when installing the 11i Enterprise or 11i Mission-critical Operating
Environments.
glance
Adviser memory consumption increases continually |  |
- Issue
GlancePlus’s glance Adviser leaks memory
when running continuously. Adviser is used by the WLM data collectors glance_app, glance_gbl, glance_prm, glance_prm_byvg, and glance_tt. - Workaround
This memory leak is fixed in GlancePlus C.03.35.00.
On HP-UX 11i v1, the best way to get this upgrade is to install
GlancePlus when installing the 11i Enterprise or 11i Mission-critical
Operating Environments.
WLM enables/disables
SLOs at end of interval |  |
- Issue
WLM enables and disables time-based SLOs only at
the end of an interval. This interval is 60 seconds by default and
can be changed with the wlm_interval keyword in your configuration file. SLOs are time-based when you set their condition or exception keyword values in the configuration file. If your interval is too long, an SLO may not be enabled as
indicated in the configuration. For example, assume the interval
is 1800 seconds (30 minutes). Also assume that one SLO is supposed
to have the entire system to itself for a short period of time,
with WLM enabling that one SLO for 20 minutes while disabling all
other SLOs for the same 20 minutes. If these 20 minutes begin at
3pm everyday, but the interval ends at 3:15pm, your configuration
does not actually go into effect until 3:15pm. Moreover, it is not
changed again until 3:45pm. - Workaround
Be aware of how your interval and time-based SLOs
interact and adjust them accordingly.
No metrics
on startup or reconfiguration |  |
- Issue
Metrics have no value on WLM startup or reconfiguration.
WLM cannot work toward any SLOs without metrics. - Workaround
Data collectors should report
metrics as soon after startup or reconfiguration as possible.
WLM configurations
cannot be activated with fewer than 100 Mbytes of memory
available |  |
- Issue
When controlling memory, WLM allocates at least
a minimum amount to each group. If extended_shares is enabled, this minimum is 0.2% of the available
memory; otherwise, it is 1%. (Available memory is the amount reported
by prmavail; it is the amount that is not reserved for the kernel (/stand/vmunix)
and its data structures and for nonkernel system processes. Thus,
available memory is not the total memory on the system. Available memory
varies over time. For more information, see the Process
Resource Manager User’s Guide, available in
/opt/prm/doc.) WLM requires that this minimum represent at least
1 Mbyte of memory. Thus, when memory control is used, the system
should have at least 100 Mbytes of available memory, or at least
500 Mbytes if extended_shares is enabled. - Workaround
Increase the system’s
memory or decrease the minimum dynamic buffer cache using the kernel
parameter dbc_min_pct.
Secure Resource Partitions: Blocked port
on a virtual network interface |  |
- Issue
Using the HP-UX feature Security Containment, you can set up
a virtual network interface for each secure compartment. A process
in one secure compartment can bind to a socket on a virtual network
interface associated with a different secure compartment. Although
this process will not be able to accept connections or use the socket
to send or receive data, it does prevent other processes from binding
to that socket. - Workaround
Be sure your applications that access the network bind only
to sockets on the virtual network interface created for their respective
secure compartments. Fore more information, see the following documents: For
more information, refer to the Security Containment release notes
and the HP-UX 11i Security
Containment Administrator’s Guide.
Reaching
the system V semaphore limit |  |
- Issue
If your system has many system V semaphores in use,
WLM usage of semaphores may push the total number of semaphores
over system limits—especially if the WLM configuration
contains a large number of data collectors. The WLM daemon wlmd creates two semaphore sets: one containing a single
semaphore; the other containing a semaphore for each data collector
in the WLM configuration. If this system limit is reached, wlmd prints the following error and exits before the
first WLM interval begins: Cannot allocate a system V semaphore set of size x: Increase the system-imposed limits. You may also see the following error: WLM--”wm_knob_init, prm_rep_load”: PRM--”PRM internal daemon binary is missing or has incorrect permissions (PRM-2352)” For additional details relating to this message, look in syslog.
In some conditions, this message indicates no semaphores are available. - Workaround
Use the SAM (/usr/sbin/sam),
SMH (/usr/sbin/smh), or kcweb (/usr/sbin/kcweb, on HP-UX 11i v2 or later) kernel configuration
utility to increase the system limits. On HP-UX 11i v1 and later, increase the following kernel
parameters: - semmns
Max number of overall semaphores - semmsl
Max number of semaphores
allowed in a semaphore set
Configuration
wizard requires PRM |  |
- Issue
Starting with the WLM A.03.01
release, WLM no longer includes Process Resource Manager (B3835DA). The WLM configuration
wizard requires PRM however. Without PRM installed, the wizard: Always sets the initial CPU (core) count
to 1 in the pop-up dialog that appears before the wizard itself
appears Returns a message about the PRM API not being installed
when it attempts to validate a configuration
- Workaround
Install PRM A.03.00 or later.
Processes
in transient FSS groups appear unexpectedly in other workload groups |  |
- Issue
A deployed WLM configuration that has transient_groups set to 1 and contains FSS transient group candidates
from time-to-time contains an FSS group called _IDLE_. As needed, WLM moves the jobs of the transient groups
to _IDLE_, where they get the minimum of CPU and memory resources. The
internal identifier for group _IDLE_ is picked by WLM on the fly (taken from the pool
of unused identifiers). On a redeployment, if the new configuration contains an FSS
group that happens to have the identifier WLM selected for the _IDLE_ group in the deployment that is being replaced,
jobs in the _IDLE_ group will migrate to the FSS group having the
same identifier. The same issue can arise going from: A transient deployment to another
transient deployment A transient deployment to a non-transient deployment
- Workaround
You can prevent this situation
by using the -i option to wlmd when deploying or redeploying a configuration. First,
shut down wlmd (using the -k option), then restart it using the -i option.
Before
modifying any partition managed by WLM, WLM and the global arbiter
must be stopped |  |
- Issue
Do not adjust any WLM-managed partition while wlmpard is running. This includes using vparmodify, icapmodify, or icod_modify to change the name, configuration, or resources
(CPU and memory) associated with the virtual partition or nPartition
(and this also includes using parolrad to modify a cell in a WLM-managed partition, as noted
in “Before
performing online cell operations on systems where WLM manages partitions,
memory, or PSETs, WLM must be stopped”). - Workaround
To adjust a partition, you must first shut down
WLM—including wlmpard—on all partitions that will be affected by the
modification, modify the partition, and then restart WLM. Changes
to Instant Capacity (iCAP) affect the entire complex; changes to
a virtual partition affect the nPartition only, unless Instant Capacity
is configured on the nPartition. For example, if WLM is managing
two virtual partitions vParA and vParB, and you need to migrate
memory resources from vParA to vParB, you must shut down WLM in
both virtual partitions. As another example, to change the name
of an nPartition, you must first shut down WLM in every operating
system instance across the entire complex, because the name change
affects Instant Capacity, and Instant Capacity changes affect every
nPartition across the complex. To stop WLM, stop the wlmpard and wlmd daemons (use the -k option with the corresponding commands).
Before
performing online cell operations on systems where WLM manages partitions,
memory, or PSETs, WLM must be stopped |  |
- Issue
If WLM is being used to manage memory records, partitions,
or PSET-based workload groups, and you attempt to perform an online
cell operation (parolrad) while WLM is running, changes made to CPU resources
by the operation might not be detected by WLM and can cause problems
for WLM management of CPU resources. Error messages will be generated. - Workaround
Before performing an online cell operation (parolrad) on a system where WLM is managing memory, partitions,
or PSETs, you must first stop WLM, perform the operation, and then
restart WLM. To stop WLM, stop the wlmpard and wlmd daemons (use the -k option with the corresponding commands). Note that wlmd should be stopped on all partitions managed by WLM. You can check the status of online cell operations by using parolrad -m command.
WLM GUI
is not compatible with different versions of WLM |  |
- Issue
If you attempt to use the WLM GUI (wlmgui) with a version of WLM that differs from the version
the GUI is associated with, the following message is displayed: The WLM product running on <hostname> and this tool are incompatible. |
The version of the WLM GUI must match the version of the WLM
product that it manages. - Workaround
Either upgrade WLM to the version of the WLM GUI
you are using, or use an earlier version of the WLM GUI that matches
the version of WLM that your GUI will manage. Note that multiple
versions of the WLM GUI can be installed on a Microsoft Windows
PC.
Upgrading
or installing PRM before upgrading WLM from C.03.00 or earlier can
cause WLM to fail swverify checks |  |
- Issue
If you install or upgrade to the latest version
of PRM (C.03.02 or later) on a system with WLM A.03.00 or earlier,
WLM will fail swverify checks. - Workaround
To ensure that WLM works properly on a system with
PRM C.03.02 or later, upgrade WLM to A.03.02 or later. When upgrading
WLM A.03.00 or earlier, upgrade WLM prior to upgrading PRM.
|