Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP A5856A RAID 4Si PCI 4-Channel Ultra2 SCSI Controller: Installation and Administration Guide > Chapter 5 Troubleshooting the HP RAID 4Si Product

Some Typical Situations

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

This section describes how to handle some typical situations that can occur while you use the RAID 4Si product:

SCSI Misconfiguration

If you are using the RAID 4Si controller with one SC10 JBOD that contains two BCCs, a common configuration error is connecting one channel of the controller to a port on one of the BCCs and another controller channel to a port on the other BCC, with the SC10 in Full Bus mode. This configuration puts both of the controller's channels on the same SCSI bus, which is wrong. To fix this problem (1) disconnect one of the controller's channels, or (2) put the SC10 into Split Bus mode (set DIP switch 1 on each BCC to 0). Also note that you must put a terminator on one of the ports of each BCC.

To determine if the SCSI bus connected to the 4Si RAID controller is improperly configured, look for this message in the /var/adm/syslog/syslog.log file:

iop_poll_for_outbound_mfa: Reading outbound FIFO timed out.

This message means that the HP RAID 4Si driver was unable to establish communication with the HP RAID 4Si controller. Usually this is caused by the SCSI bus being badly configured, so that the controller is very busy handling constant aborts, resets, etc.

To fix this problem, follow these steps:

  1. Turn the HP-UX system's power off.

  2. Disconnect all of the SCSI cables from the HP RAID 4Si controller.

  3. Turn the system's power on.

  4. Run these two commands to confirm that the controller is claimed:

    ioscan

    irdiag -i

    If the iop is found by both commands, the controller is claimed.

    If the iop is not found, the HP RAID 4Si software is not properly installed. So, you must re-install the software (see “Installing the HP RAID 4Si Software”).

  5. Turn the system's power off.

  6. Connect one SCSI cable from one of the controller's channel connectors to a properly terminated SCSI bus.

  7. Turn the system's power on.

  8. Run the ioscan and irdiag commands, as shown in step 4, to confirm that the controller is claimed.

    If the controller is claimed, go to step 9.

    If the controller is not claimed, check for one of these two causes:

    • The HP RAID 4Si software is not properly installed—you must re-install the software (see “Installing the HP RAID 4Si Software”).

    • A system resource problem is preventing the software from claiming the controller.

  9. Start IRM.

  10. From the "Management Menu," select Objects.

  11. Select Physical Drive.

  12. Confirm that all of the physical disk drives connected to the HP RAID 4Si controller are shown in the display. This means that the controller can see the disks on the SCSI bus, and so the controller is functioning correctly.

    If all of the disk drives are not in the display, be sure you have installed all of the required patches (see the HP RAID 4Si Release Note for information about the required patches). If you have, check the SCSI terminators, the cables, and the JBOD configurations.

  13. Exit IRM.

Cannot Install HP-UX OS on an HP RAID 4Si Logical Drive

If you are not able to install the OS on an HP RAID 4Si logical drive, check these possible causes:

  • The correct PDC version is not installed on the system. You can check the system's PDC version by issuing this command at the BCH prompt:

    in fv

    See the HP RAID 4Si Release Note for information about PDC versions.

  • The HP RAID 4Si controller was not configured before you began the OS installation on the logical drive. “Installing HP-UX on a Logical Drive” contains instructions for configuring the controller with Ignite-UX.

Cannot Boot over RAID

If you are not able to boot from the HP RAID 4Si logical drive, check these possible causes:

To check the first two possible causes above, you must be able to run IRM. As mentioned earlier, we recommend you have an alternate boot media that has the HP RAID 4Si software installed on it. Then, you can boot the system from that alternate media and run IRM (and the other HP RAID 4Si diagnostic tools).

You also can run IRM by running Ignite-UX from the HP-UX 11.0 Core OS CD, the HP-UX 11i OE CD, or the Ignite-UX server (IRM is one of the Ignite-UX options). To do this, follow these steps:

  1. Start Ignite-UX from either a CD or the Ignite-UX server. The Ignite-UX "Welcome" screen displays.

  2. Select Advanced Options. The "Advanced Options" screen displays.

  3. Select Configure A5856A RAID 4Si cards (this allows you to run IRM and do the troubleshooting). If you are running Ignite-UX from the CD, IRM starts. If you are running from the server, you are asked for network information (the host name, host IP address, subnet mask, etc.); then IRM starts.

  4. Use IRM to troubleshoot the first two possible causes listed earlier.

  5. When you have finished troubleshooting, exit IRM.

  6. In the "Advanced Options" menu, select OK. You are placed back in the Ignite-UX "Welcome" screen.

  7. Exit Ignite-UX.

A Disk Drive Might Be Failing

If irmd generates messages saying a physical drive has failed, you might have to replace that drive.

TIP: For a logical drive that is redundantly configured (that is, any RAID level other than 0), when the physical drive's failure is detected, the state of the logical drive it belongs to changes from OPTIMAL to DEGRADED. If a second physical drive fails while the logical drive is in the DEGRADED state, the logical drive goes to the OFFLINE state.

For RAID 0, because no redundant configuration exists, the logical drive goes directly from the OPTIMAL state to the OFFLINE state when a physical drive fails.

Determining the Drive

irmd reports drives by their channel and SCSI ID, so you need to determine which physical drive irmd is referring to.

To determine which physical drive within an SC10 irmd is generating messages about, you can do one of these things:

  • Check the SC10's documentation.

  • Run the command irdiag -v, which will generate a listing that maps the SC10's slots to SCSI IDs.

Once you have determined the drive irmd is reporting failures for, look at the drive to see if a fault LED is on (if one is visible). You also can look in the SCSI Device n:x sections of the output from irdiag -v for a very long list of errors.

Replacing the Drive—RAID 0

irmd reports SCSI sense data, which you can analyze to try to determine how severe the drive's errors are.

For a RAID 0 configuration, if it looks like the drive is having only intermittent read/write problems, try following these steps to replace a faulty disk drive:

  1. Stop using the faulty disk drive.

  2. Back up the data on the drive.

  3. Replace the drive according to the directions in your SC10 documentation.

  4. Restore the data to the new drive.

Replacing the Drive—Other Than RAID 0

For any RAID configuration other than RAID 0, follow these steps to replace a faulty physical drive in a JBOD:

  1. Remove the faulty drive (you do not need to turn the JBOD's power off).

  2. Insert a new drive into the empty slot.

    If you do not have a Hot Spare configured for the HP RAID 4Si controller—or you do not have a Hot Spare that is large enough to take over for the failed physical drive—the logical drive stays in the DEGRADED state. This means that the logical drive will go to the OFFLINE state if another physical drive fails. The new physical drive is detected when you insert it into the slot. The rebuild should start automatically (if it does not, you must start it manually [see “Manual Rebuild—Single Drive”]). When the disk rebuild is complete, the new physical drive is fully functional and the logical drive returns to the OPTIMAL state. (Note that the physical drive's rebuild can take several hours.)

    If you have a Hot Spare configured, the Hot Spare takes over for the faulty drive when the faulty drive goes into the FAILED state. The disk rebuild starts when the failure is detected (this assumes that the Hot Spare has enough capacity to take over for the failed drive). Note that the physical drive's rebuild can take several hours. When the disk rebuild is complete, the rebuilt drive (formerly the Hot Spare) now shows as ONL, and has the Ann-0x number combination that the failed drive had. Also, the logical drive returns to the OPTIMAL state. If you do not want to designate another Hot Spare (to take the place of the one used for the rebuild), you are finished. If you want to designate another Hot Spare, follow these steps:

    1. Start IRM.

    2. From the "Management Menu," select Objects.

    3. Select Physical Drive.

      The "Objects - PHYSICAL DRIVE SELECTION MENU" displays.

    4. Select the drive you want to be the Hot Spare. This will probably be the drive that originally failed, which you replaced with a good drive (remember, the replacement [good] drive is not configured yet).

      The "Channel-n, Target-x" menu displays.

    5. Select Make HotSpare.

    6. You are asked to confirm that you want to designate that physical drive as a Hot Spare.

      If you do not want to make that drive a Hot Spare, select NO. You are placed back in the "Channel-n, Target-x" menu. Press Esc; you are placed in the "Objects - PHYSICAL DRIVE SELECTION MENU". Go to step g.

      If you want to make that drive a Hot Spare, select YES. You are placed back in the "Objects - PHYSICAL DRIVE SELECTION MENU". Note that the physical drive now shows as HSP. Go to step g.

    7. Press Esc; you are placed in the "Objects" menu.

    8. Press Esc. You are placed in the "Management Menu."

    9. Press Esc. An exit confirmation dialog box displays.

    10. Highlight YES and press Enter; IRM ends.

Multiple Disk Drives Have Failed

If several disk drives in a JBOD have failed, eliminate these possible causes:

  • Cables are disconnected or loosely connected.

  • The SCSI bus is not properly terminated.

  • The BCC is experiencing a problem.

  • The JBOD is experiencing power problems.

Also, see the documentation for the JBOD for other possible problems.

Disk Rebuild Does Not Automatically Start

If a physical disk drive fails and you have a Hot Spare configured, the Hot Spare takes over for the failed drive and IRM automatically tries to rebuild the failed drive. But, if the rebuild does not automatically start for some reason, follow the steps in “Manual Rebuild—Single Drive” to manually do the rebuild.

Disks Are Not Configured Into Logical Drives

If you finished installing the HP RAID 4Si controller and the physical drives are not configured into logical drives, follow these steps:

  1. Disconnect all SCSI cables from the controller.

    This prevents the RAID configuration information stored on the physical drives from being erased.

  2. Follow the steps in “Upgrading the Controller Firmware” to clear the configuration information in the controller's NVRAM.

  3. Reconnect the SCSI cables.

  4. Determine if the logical drives are now configured correctly.

    If the logical drives are not configured correctly, follow these steps:

    1. Repeat the steps to clear the configuration information in the controller's NVRAM.

    2. Replicate the HP RAID 4Si configuration, using the hardcopy record of the information you created earlier (see “Creating a Hardcopy Record of the Configuration”).

    If the logical drives are still not configured correctly, you must do a "fresh" configuration (that is, from the beginning); see “Using New Configuration”. Then, restore your data from backup files.

A Controller's Battery Will Not Hold a Charge

If a controller's battery will not hold a charge, the controller still responds, but you must replace the controller (because you cannot replace the battery separately). See “Replacing a Controller” for more details.

Battery Condition Out of Range

If irmd repeatedly reports a battery condition that is out of range, you must replace the controller's battery. (See “irmd Messages” for all of the irmd battery-related messages.) The only way to replace the controller's battery is to replace the entire controller (the battery itself is not replaceable); see “Replacing a Controller”.

Battery Charge Counter Is Too High

The Fast Charge Counter of the battery in your HP RAID 4Si controller has an invalid value if irmd generates this message:

Battery failed to charge because the battery charge count has exceeded the limit.

The invalid value means that if the battery is not in a fully charged state, your data might not be protected if the HP-UX system experiences a power outage.

Follow these steps to reset the charge counter:

  1. Verify that the RAID driver and (for HP-UX 11.0 only) the appropriate patches are installed, by issuing the swlist command. The generated output could look like this (the RAID driver information is shown in bold, for highlighting purposes, in this example):

    A5856A       B.11.00.04 I2O RAID
    PHKL_25023 1.0 PCI ExpROM,bridge,BA hints,Lowfat,PIP,EPIC
    PHKL_24729 1.0 I/O pdir, SBA

    To determine whether the appropriate patches are installed, check the HP RAID 4Si Release Note for a list of the required patches.

  2. Start IRM.

  3. From the "Management Menu," select Objects.

  4. Select Battery Backup. The "Battery Backup" menu displays.

  5. Check the value for Charge Counter.

    If it is within the range 0 through 1100, it is a valid value; press Esc. You are placed back in the "Objects" menu; go to step 9.

    For any other value, go to step 6.

  6. Select Charge Counter.

  7. Select YES on the "Reset Charge Cycles?" dialog box. The Charge Counter value is set to 0.

  8. Press Esc; you are placed in the "Objects" menu.

  9. Press Esc. You are placed in the "Management Menu."

  10. Press Esc. An exit confirmation dialog box displays.

  11. Highlight YES and press Enter; IRM ends.

A Controller Is Not Responding

A controller can be non-responsive to the HP-UX driver because it is experiencing a hardware failure. In this case, you must replace the controller (see “Replacing a Controller”).

However, before you replace the controller, note that it also can be functioning correctly but still not be responding to the HP-UX driver. In this case, one of these three things is true:

  • The controller's firmware was updated (through STM or i2outil).

  • A SCSI configuration problem exists. In this case, the controller's firmware tries to communicate over the SCSI bus when the firmware initializes. If SCSI cable or termination problems exist, the firmware might not complete the intialization process. So, it will not be able to communicate with the HP-UX driver.

  • A defect might exist in the controller's firmware, causing the HP RAID 4Si controller to malfunction.

To rule out these three possible causes, follow these steps:

  1. Reboot the HP-UX system.

  2. Issue this command:

    irdisplay

  3. If the controller information is returned by irdisplay, the controller is communicating with the driver correctly; the problem is fixed.

    If the controller information is not returned by irdisplay, go to step 4.

  4. Issue this command:

    shutdown

  5. Turn off the system's power.

  6. Remove all of the SCSI cables from the HP RAID 4Si controller.

  7. Turn on the system's power.

  8. Issue this command:

    irdisplay

  9. If the controller information is returned by irdisplay, the controller is communicating with the driver correctly; the problem is fixed. Reattach all of the SCSI cables to the controller.

    If the controller information is not returned by irdisplay, you have ruled out the three possible causes described earlier, so the controller is faulty—you must replace the controller (see “Replacing a Controller” below).

Replacing a Controller

To replace a RAID controller, follow these steps:

  1. Prepare the HP-UX for shutdown. This might include moving applications to another system, informing users of the upcoming downtime, and backing up all of the user data.

  2. For a system running HP-UX 11.0, turn the system's power off; then, go to step 3.

    For a system running HP-UX 11i and HP RAID 4Si driver version B.11.11.01 or later, and which has slots that support Online Addition and Replacement (OLAR) of PCI cards—and the HP RAID 4Si controller is in one of those OLAR slots—follow the instructions in the "Managing PCI Cards with OLAR" chapter of the Configuring HP-UX Peripherals manual. (Also, “Online Addition and Replacement” of this manual contains information about OLAR.)

    For a system running HP-UX 11i, but which does not support OLAR, turn the system's power off; then, go to step 3.

  3. If you did not do so when you installed the original controller, for each cable connected to the controller, attach a label that shows (1) the controller's slot number, (2) the controller's port letter (and corresponding number: A=0, B=1, C=2, and D=3), and (3) the JBOD's BCC location (top or bottom) and its port letter (A or B).

  4. Turn off the power of each JBOD.

  5. Remove the cables from the controller (you can leave them attached to the JBOD).

  6. Remove the faulty controller from the HP-UX system.

  7. Insert the new controller into the system.

  8. Connect the cables to the new controller (be sure to connect each cable to the same controller channel it was connected to before).

  9. Turn on the power of each JBOD.

  10. If you turned off the system's power (you did not use OLAR), turn the power back on; then, go to step 11.

    If you did not turn off the system's power (you used OLAR), skip this step and go to step 11.

  11. Run IRM.

  12. Select Configure.

  13. Select View/Add Configuration.

  14. If the HP RAID 4Si controller's firmware version is earlier than U.01.04, or the replacement controller was previously configured with logical drives, you get a prompt asking if you want to view NVRAM or Disk Configuration; select Disk Configuration and then go to step 15.

    If you do not get that prompt, skip this step and go to step 15.

  15. Verify that the configuration is correct.

  16. Press Esc; you are placed in the "Configure" menu.

  17. Press Esc. You are placed in the "Management" menu.

  18. Press Esc. An exit confirmation dialog box displays.

  19. Highlight YES and press Enter; IRM ends.

    The configuration is saved to the new controller's NVRAM. This is important because the RAID configuration usually resides in the NVRAM on the controller; a backup copy is stored on each physical drive that is configured. So, when you replace a controller, the configuration must be loaded into the new controller's NVRAM.

  20. Rescan the hardware by issuing this command:

    irdisplay -f

  21. Check to make sure all of the LUNs are claimed, by issuing this command:

    ioscan -fn

  22. Issue any necessary LVM recovery commands.

You can now use the logical drives.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 2002, - Hewlett-Packard Development Company, L.P.