 |
» |
|
|
 |
This section describes how to handle some typical situations
that can occur while you use the RAID 4Si product: SCSI
Misconfiguration |  |
If you are using the RAID 4Si controller with one SC10 JBOD
that contains two BCCs, a common configuration error is connecting
one channel of the controller to a port on one of the BCCs and another controller
channel to a port on the other BCC, with the SC10 in Full Bus mode.
This configuration puts both of the controller's channels
on the same SCSI bus, which is wrong. To fix this problem (1) disconnect
one of the controller's channels, or (2) put the SC10 into
Split Bus mode (set DIP switch 1 on each BCC to 0). Also note that you must put
a terminator on one of the ports of each BCC. To determine if the SCSI bus connected to the 4Si RAID controller
is improperly configured, look for this message in the /var/adm/syslog/syslog.log file: iop_poll_for_outbound_mfa: Reading outbound FIFO timed out.
|
This message means that the HP RAID 4Si driver was unable
to establish communication with the HP RAID 4Si controller. Usually
this is caused by the SCSI bus being badly configured, so that the
controller is very busy handling constant aborts, resets, etc. To fix this problem, follow these steps: Turn the HP-UX system's
power off. Disconnect all of the SCSI cables
from the HP RAID 4Si controller. Turn the system's power
on. Run these two commands to confirm
that the controller is claimed: ioscan irdiag -i If the iop is found by both commands, the controller is claimed. If the iop is not found, the HP RAID 4Si software is not
properly installed. So, you must re-install the software (see “Installing the HP
RAID 4Si Software”). Turn the system's power
off. Connect one SCSI cable from one
of the controller's channel connectors to a properly terminated
SCSI bus. Turn the system's power
on. Run the ioscan and irdiag commands, as shown in step 4, to confirm that the controller
is claimed. If the controller is claimed, go to step 9. If the controller is not claimed, check for one of these two
causes: The
HP RAID 4Si software is not properly installed—you must re-install
the software (see “Installing the HP
RAID 4Si Software”). A system resource problem
is preventing the software from claiming the controller.
Start IRM. From the "Management
Menu," select Objects. Select Physical Drive. Confirm that all of the physical
disk drives connected to the HP RAID 4Si controller are shown in
the display. This means that the controller can see the disks on
the SCSI bus, and so the controller is functioning correctly. If all of the disk drives are not in the display, be
sure you have installed all of the required patches (see the HP
RAID 4Si Release Note for information about the required
patches). If you have, check the SCSI terminators, the cables, and
the JBOD configurations. Exit IRM.
Cannot
Install HP-UX OS on an HP RAID 4Si Logical Drive |  |
If you are not able to install the OS on an HP RAID 4Si logical
drive, check these possible causes: The correct PDC version is
not installed on the system. You can check the system's
PDC version by issuing this command at the BCH prompt: in fv See the HP RAID 4Si Release Note for
information about PDC versions. The HP RAID 4Si controller
was not configured before you began the OS installation on the logical
drive. “Installing
HP-UX on a Logical Drive” contains
instructions for configuring the controller with Ignite-UX.
Cannot Boot over RAID |  |
If you are not able to boot from the HP RAID 4Si logical drive,
check these possible causes: One or more of the logical drive's
physical drives have failed, putting the logical drive into the
OFFLINE state. Run IRM to determine whether any of the physical
drives have failed (see “Determining Which
Physical Drive Has Failed”). If one or more drives have failed, follow the steps in “A
Disk Drive Might Be Failing” for replacing a failed
drive. Once the logical drive is in a state other than OFFLINE,
you should be able to boot from it. The logical drive can be in the OFFLINE state for reasons
other than one more more failed physical drives. See “Multiple
Disk Drives Have Failed” for a list of some other possible
causes. The controller is not responding.
To determine if the controller actually is responding, start IRM.
If IRM reports that no controllers are found, the controller is
no longer responding. See “A
Controller Is Not Responding” and “Replacing a Controller” for
more information. The HP RAID 4Si software
is not installed on the logical drive. Ensure that the software
was included when the OS was installed on the logical drive (see “Installing
HP-UX on a Logical Drive”).
To check the first two possible causes above, you must be
able to run IRM. As mentioned earlier, we recommend you
have an alternate boot media that has the HP RAID 4Si software installed
on it. Then, you can boot the system from that alternate media and
run IRM (and the other HP RAID 4Si diagnostic tools). You also can run IRM by running Ignite-UX from the HP-UX 11.0
Core OS CD, the HP-UX 11i OE CD, or the Ignite-UX server (IRM is
one of the Ignite-UX options). To do this, follow these steps: Start Ignite-UX from
either a CD or the Ignite-UX server. The Ignite-UX "Welcome" screen
displays. Select Advanced Options. The "Advanced Options" screen
displays. Select Configure A5856A RAID 4Si cards (this allows you to run IRM and do the troubleshooting).
If you are running Ignite-UX from the CD, IRM starts. If you are
running from the server, you are asked for network information (the
host name, host IP address, subnet mask, etc.); then IRM starts. Use IRM to troubleshoot the first
two possible causes listed earlier. When you have finished troubleshooting,
exit IRM. In the "Advanced Options" menu,
select OK. You are placed back in the Ignite-UX "Welcome" screen. Exit Ignite-UX.
A
Disk Drive Might Be Failing |  |
If irmd generates messages saying a physical drive has
failed, you might have to replace that drive. irmd reports drives by their channel and SCSI ID, so
you need to determine which physical drive irmd is referring to. To determine which physical drive within an SC10 irmd is generating messages about, you can do one of
these things: Check
the SC10's documentation. Run the command irdiag -v, which will generate a listing that maps the SC10's
slots to SCSI IDs.
Once you have determined the drive irmd is reporting failures for, look at the drive to
see if a fault LED is on (if one is visible). You also can look
in the SCSI Device n:x sections of the output from irdiag -v for a very long list of errors. Replacing
the Drive—RAID 0irmd reports SCSI sense data, which you can analyze
to try to determine how severe the drive's errors are. For a RAID 0 configuration, if it looks like the drive is
having only intermittent read/write problems, try following these
steps to replace a faulty disk drive: Stop using the faulty
disk drive. Back up the data on the drive. Replace the drive according to
the directions in your SC10 documentation. Restore the data to the new drive.
Replacing
the Drive—Other Than RAID 0For any RAID configuration other than RAID 0, follow these
steps to replace a faulty physical drive in a JBOD: Remove the faulty
drive (you do not need to turn the JBOD's power off). Insert a new drive into the empty
slot. If you do not have a Hot Spare configured for the HP
RAID 4Si controller—or you do not have a Hot Spare that
is large enough to take over for the failed physical drive—the
logical drive stays in the DEGRADED state. This means that the logical
drive will go to the OFFLINE state if another physical drive fails.
The new physical drive is detected when you insert it into the slot.
The rebuild should start automatically (if it does not, you must
start it manually [see “Manual
Rebuild—Single Drive”]). When the disk rebuild is complete, the new physical
drive is fully functional and the logical drive returns to the OPTIMAL
state. (Note that the physical drive's rebuild can take several
hours.) If you have a Hot Spare configured, the Hot Spare takes over
for the faulty drive when the faulty drive goes into the FAILED
state. The disk rebuild starts when the failure is detected (this
assumes that the Hot Spare has enough capacity to take over for
the failed drive). Note that the physical drive's rebuild
can take several hours. When the disk rebuild is complete, the rebuilt
drive (formerly the Hot Spare) now shows as ONL, and has the Ann-0x number combination that the failed drive had. Also,
the logical drive returns to the OPTIMAL state. If you do not want
to designate another Hot Spare (to take the place of the one used for
the rebuild), you are finished. If you want to designate another
Hot Spare, follow these steps: Start IRM. From the "Management
Menu," select Objects. Select Physical Drive. The "Objects - PHYSICAL DRIVE SELECTION MENU" displays. Select the drive you want to
be the Hot Spare. This will probably be the drive that originally
failed, which you replaced with a good drive (remember, the replacement
[good] drive is not configured yet). The "Channel-n, Target-x" menu displays. Select Make HotSpare. You are asked to confirm that
you want to designate that physical drive as a Hot Spare. If you do not want to make that drive a Hot Spare,
select NO. You are placed back in the "Channel-n, Target-x" menu. Press Esc; you are placed in the "Objects - PHYSICAL DRIVE
SELECTION MENU". Go to step g. If you want to make that drive a Hot Spare, select YES. You are placed back in the "Objects
- PHYSICAL DRIVE SELECTION MENU". Note that the physical
drive now shows as HSP. Go to step g. Press Esc; you are placed in the "Objects" menu. Press Esc. You are placed in the "Management Menu." Press Esc. An exit confirmation dialog box displays. Highlight YES and press Enter; IRM ends.
Multiple
Disk Drives Have Failed |  |
If several disk drives in a JBOD have failed, eliminate these
possible causes: Cables
are disconnected or loosely connected. The SCSI bus is not properly
terminated. The BCC is experiencing a
problem. The JBOD is experiencing
power problems.
Also, see the documentation for the JBOD for other possible
problems. Disk Rebuild Does
Not Automatically Start |  |
If a physical disk drive fails and you have a Hot Spare configured,
the Hot Spare takes over for the failed drive and IRM automatically
tries to rebuild the failed drive. But, if the rebuild does not
automatically start for some reason, follow the steps in “Manual
Rebuild—Single Drive” to manually do the rebuild. Disks
Are Not Configured Into Logical Drives |  |
If you finished installing the HP RAID 4Si controller and
the physical drives are not configured into logical drives, follow
these steps: Disconnect all SCSI
cables from the controller. This prevents the RAID configuration information stored
on the physical drives from being erased. Follow the steps in “Upgrading the Controller
Firmware” to clear the configuration
information in the controller's NVRAM. Reconnect the SCSI cables. Determine if the logical drives
are now configured correctly. If the logical drives are not configured correctly,
follow these steps: Repeat the steps to
clear the configuration information in the controller's
NVRAM. Replicate the HP RAID 4Si configuration,
using the hardcopy record of the information you created earlier
(see “Creating
a Hardcopy Record of the Configuration”).
If the logical drives are still not configured correctly,
you must do a "fresh" configuration (that is,
from the beginning); see “Using
New Configuration”. Then, restore your data from backup files.
A
Controller's Battery Will Not Hold a Charge |  |
If a controller's battery will not hold a charge,
the controller still responds, but you must replace the controller
(because you cannot replace the battery separately). See “Replacing a Controller” for more details. Battery
Condition Out of Range |  |
If irmd repeatedly reports a battery condition that is
out of range, you must replace the controller's battery.
(See “irmd Messages” for all
of the irmd battery-related messages.) The only way to replace
the controller's battery is to replace the entire controller
(the battery itself is not replaceable); see “Replacing a Controller”. Battery
Charge Counter Is Too High |  |
The Fast Charge Counter of the battery in your HP RAID 4Si
controller has an invalid value if irmd generates this message: Battery failed to charge because the battery charge count has exceeded the limit.
|
The invalid value means that if the battery is not in a fully
charged state, your data might not be protected if the HP-UX system
experiences a power outage. Follow these steps to reset the charge counter: Verify that the RAID
driver and (for HP-UX 11.0 only) the appropriate patches are installed,
by issuing the swlist command. The generated output could look like this (the
RAID driver information is shown in bold,
for highlighting purposes, in this example): A5856A B.11.00.04 I2O RAID PHKL_25023 1.0 PCI ExpROM,bridge,BA hints,Lowfat,PIP,EPIC PHKL_24729 1.0 I/O pdir, SBA
|
To determine whether the appropriate patches are installed,
check the HP RAID 4Si Release Note for a
list of the required patches. Start IRM. From the "Management
Menu," select Objects. Select Battery Backup. The "Battery Backup" menu displays. Check the value for Charge Counter. If it is within the range 0 through 1100, it is a valid value; press Esc. You are placed back in the "Objects" menu;
go to step 9. For any other value, go to step 6. Select Charge Counter. Select YES on the "Reset Charge Cycles?" dialog
box. The Charge Counter value is set to 0. Press Esc; you are placed in the "Objects" menu. Press Esc. You are placed in the "Management Menu." Press Esc. An exit confirmation dialog box displays. Highlight YES and press Enter; IRM ends.
A
Controller Is Not Responding |  |
A controller can be non-responsive to the HP-UX driver because
it is experiencing a hardware failure. In this case, you must replace
the controller (see “Replacing a Controller”). However, before you replace
the controller, note that it also can be functioning correctly but still not be responding
to the HP-UX driver. In this case, one of these three things is
true: The controller's
firmware was updated (through STM or i2outil). A SCSI configuration problem
exists. In this case, the controller's firmware tries to
communicate over the SCSI bus when the firmware initializes. If
SCSI cable or termination problems exist, the firmware might not
complete the intialization process. So, it will not be able to communicate
with the HP-UX driver. A defect might exist in the
controller's firmware, causing the HP RAID 4Si controller
to malfunction.
To rule out these three possible causes, follow these steps: Reboot the HP-UX system. Issue this command: irdisplay If the controller information
is returned by irdisplay, the controller is communicating with the driver
correctly; the problem is fixed. If the controller information is not returned by irdisplay, go to step 4. Issue this command: shutdown Turn off the system's
power. Remove all of the SCSI cables
from the HP RAID 4Si controller. Turn on the system's
power. Issue this command: irdisplay If the controller information
is returned by irdisplay, the controller is communicating with the driver
correctly; the problem is fixed. Reattach all of the SCSI cables
to the controller. If the controller information is not returned by irdisplay, you have ruled out the three possible causes
described earlier, so the controller is faulty—you must
replace the controller (see “Replacing a Controller” below).
Replacing a Controller |  |
To replace a RAID controller, follow these steps: Prepare the HP-UX
for shutdown. This might include moving applications to another
system, informing users of the upcoming downtime, and backing up
all of the user data. For a system running HP-UX 11.0,
turn the system's power off; then, go to step 3. For a system running HP-UX 11i and HP RAID 4Si driver
version B.11.11.01 or later, and which has slots that support Online
Addition and Replacement (OLAR) of PCI cards—and the HP RAID 4Si controller is in one of
those OLAR slots—follow the instructions in
the "Managing PCI Cards with OLAR" chapter of
the Configuring HP-UX Peripherals manual.
(Also, “Online
Addition and Replacement” of this manual
contains information about OLAR.) For a system running HP-UX 11i, but which does not support
OLAR, turn the system's power off; then, go to step 3. If you did not do so when you
installed the original controller, for each cable connected to the
controller, attach a label that shows (1) the controller's
slot number, (2) the controller's port letter (and corresponding
number: A=0, B=1, C=2, and D=3), and (3) the JBOD's BCC
location (top or bottom) and its port letter (A or B). Turn off the power of each JBOD. Remove the cables from the controller
(you can leave them attached to the JBOD). Remove the faulty controller
from the HP-UX system. Insert the new controller into
the system. Connect the cables to the new
controller (be sure to connect each cable to the same controller
channel it was connected to before). Turn on the power of each JBOD. If you turned off the system's
power (you did not use OLAR), turn the power back on; then, go to
step 11. If you did not turn off the system's power
(you used OLAR), skip this step and go to step 11. Run IRM. Select Configure. Select View/Add Configuration. If the HP RAID 4Si controller's
firmware version is earlier than U.01.04, or
the replacement controller was previously configured with logical drives,
you get a prompt asking if you want to view NVRAM or Disk Configuration; select Disk Configuration and then go to step 15. If you do not get that prompt, skip this step and go
to step 15. Verify that the configuration
is correct. Press Esc; you are placed in the "Configure" menu. Press Esc. You are placed in the "Management" menu. Press Esc. An exit confirmation dialog box displays. Highlight YES and press Enter; IRM ends. The configuration is saved to the new controller's
NVRAM. This is important because the RAID configuration usually
resides in the NVRAM on the controller; a backup copy is stored
on each physical drive that is configured. So, when you replace
a controller, the configuration must be loaded into the new controller's
NVRAM. Rescan the hardware by issuing
this command: irdisplay -f Check to make sure all of the
LUNs are claimed, by issuing this command: ioscan -fn Issue any necessary LVM recovery
commands.
You can now use the logical drives.
|