Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP-UX System Administration Tasks: HP 9000 > Chapter 4 Working with HP-UX File Systems

Dealing with File System Corruption

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Index

Hardware failures, accidental power loss, or improper shutdown procedures can cause corruption in otherwise reliable file systems.

CAUTION: To ensure file system integrity, always follow proper shutdown procedures as described in Chapter 2.

Never take a system offline by merely shutting its power off or by disconnecting it.

Diagnosing a Corrupt File System

If you notice some of the following symptoms, you may have a corrupt file system:

  • A file contains incorrect data (garbage).

  • A file has been truncated or has missing data.

  • Files disappear or change locations for unknown reasons.

  • Error messages suggesting the possibility of corruption appear on a user's terminal, the system console, or the system log.

  • You experience difficulty changing directories or listing the files in a directory.

  • The system fails to reboot, possibly as a result of one or more errors reported by the /sbin/bcheckrc script during bootup.

Especially if several users experience some of the above problems and if you cannot readily identify other causes for these difficulties, check the file system for inconsistencies using fsck as described next.

Locating and Correcting Corruption Using fsck

In the event of a system failure, you will need to reboot your system and run fsck(1M). For HFS or VxFS, those file systems listed in /etc/fstab will be checked automatically. fsck, the file system checker, is the primary HP-UX tool for finding file system inconsistencies and also correcting them.

Additionally, if you suspect that a file system is corrupt, or in order to do periodic preventative maintenance, you should also check the file system.

fsck examines the file system for a variety of system inconsistencies. Refer to fsck(1M), fsck_hfs(1M), and fsck_vxfs(1M) for more information.

Checking an HFS File System

The following steps apply only to checking an HFS file system. If your file system is VxFS, see the next section "Checking a VxFS File System".

Step 1: Before running fsck, make sure that a lost+found directory is present at the root for each file system you plan to examine; it is also helpful if lost+found is empty. (fsck places any problem files or directories it finds in lost+found.)

If lost+found is no longer present, rebuild it using mklost+found(1M).

Step 2: Terminate processes with files open on the suspect file system or shut down your system.

You need to terminate processes with files open on the file system so that you can unmount it. If it is the root file system, execute shutdown (without -h or -r) to enter the single-user state. (See Chapter 2 more information on shutting down your system. Also see "Solving Unmounting Problems" earlier in this chapter.)

Step 3: Unmount the file system using SAM or the umount command.

NOTE: Step 3 should be skipped if you have brought your system to a single-user run-level.

The root file system cannot be unmounted.

Step 4: Run fsck with the -p option.

fsck's -p option allows you to fix many file system problems, running non-interactively. (See fsck(1M) for information on fsck's options.) If fsck either finds no errors or finds correctable errors, it corrects any such errors and prints information about the file system it checked. If fsck encounters a problem it cannot correct while running with the -p option, it will terminate with an error message.

Use the following table to determine what to do next based on three possible outcomes:

If fsck reports...

Then proceed to...

Followed by...

no errors

Step 5a

you are done

errors and corrects them all

Step 5b

Step 7

any uncorrectable errors with an error message

Step 5c

Step 6

Step 5a: Check for other causes to the problem.

If fsck runs without encountering any errors, the problem is not a corrupted file system. At this point, you should re-examine other possible causes. Here are a few things that can cause problems with files. There are others; these are the most common.

  • A user deleted, overwrote, moved, or truncated the file(s) in question.

  • A program/application deleted, overwrote, moved, or truncated the file(s).

  • The file system associated with a particular directory at the time a file was created might not be the one that is mounted to that directory at this time (if any are).

  • A file (or group of files) was placed in a directory that now has a file system mounted to it. The files that were in the directory before you mounted the current file system still exist, but won't be accessible until you unmount the file system that is covering them.

  • The protection bits on the file don't permit you to access it.

  • The ownership of a file does not permit you to access it.

Because your file system is not corrupt, do not continue with the remaining steps in this procedure.

Step 5b: Restore any necessary files.

Because fsck found and corrected all the errors it located in the file system, it is likely these errors were the cause of the problems. Once the damage has been repaired, the file system is again structurally sound. If any of your needed files have been lost, you will need to restore them from a backup or from lost+found. Since fsck has repaired the damage, you do not run fsck again as described in Step 6. Rather, proceed to Step 7.

Step 5c: Prepare to run fsck interactively.

Since fsck terminated without correcting all the errors it found, you will need to continue the effort to fix problems. The UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY message means you should rerun fsck so that you can interact with it, that is, without either the -p or -P options. When an inconsistency is found, fsck prompts you for a response.

When you run fsck interactively, it may need to perform actions that could cause the loss of data or the removal of a file/file name (such as when two files claim ownership of the same data blocks). Because of this, any backups of this file system at this point are likely to fail. This is another reason you should back up your system regularly as described in Chapter 9.

If you have critical files on this file system that have not yet been backed up (and are still intact), move them to another file system or try saving the critical files only to tape.

IMPORTANT: You should empty the lost+found directory before you run fsck again.

Step 6: Once you have completed Step 5c, you are ready to run fsck interactively. To do this, re-enter fsck without using the -p or -P option.

As fsck encounters errors, it will request permission to perform certain tasks. If you do not give fsck permission to perform the correction, it will bypass the operation, leaving the file system unrepaired.

After running interactively, in many cases fsck will request you do a reboot -n. If you do not do the reboot -n at this time, you could re-corrupt your file system. (Note that you should not use reboot -n for normal rebooting activities.)

Step 7: Examine files in the lost+found directory.

Once you've allowed fsck to repair the file system, mount the file system and check its lost+found directory for any entries that might now be present.

If there are any entries present, listed by inode number, these are files that have lost their association with their original directories. Examine these files and try to determine their proper location and name, if you can, then return the "orphaned" files to their proper location.

To do this, begin by using the file command to determine what type of files these are. If they are ASCII text files, you can review them using cat or more to see what they contain. If they are some other type, you will have to use a utility such as xd or od to examine their contents. Or, run the commands what or strings to help you find the origin of your lost+found files.

Once you have returned the files in the lost+found directory to their proper locations, restore any files that are missing from your most recent backup.

IMPORTANT: If you encounter the following message
CAN'T READ BLOCK ...

there may be a media problem that mediainit(1) can resolve. Otherwise, hardware failure has probably occurred; in this case, contact your local sales and support office.

Checking a VxFS File System

In the event of a system failure other than disk failure, fsck only needs to scan an intent log, not the entire file system. The intent log consists of records of all pending changes to the file system structure, that is, a logging of transactions the system intends to make to the file system prior to actually doing the changes. A "replay" of the intent log is very fast and may be no more time consuming for a large file system than a small one because it is dependent on file system activity rather than file system size. As a result, in the event of a system failure, the system can be up and running again very quickly.

In cases of disk failure, scanning the VxFS intent log is not sufficient; in such instances, you will need to check the entire file system, not just a scan of the intent log. Do this by using the -o full option of fsck.

Summary of Differences between HFS and VxFS File Checking

Although the checking and correcting process using fsck is similar for both HFS and VxFS, there are some important differences which are summarized in the table below.

Table 4-2 HFS vs. VxFS File Checking Following System Failure

HFS

VxFS

What needs to be checked?

The entire file system. This can be time consuming. As the size of the file system increases, the time required for fsck will increase.

The intent log only. This may be no more time consuming for for a large file system than a small one.

What assurance is there of file system integrity?

No assurance fsck will be able to repair a file system after a crash, although it usually can; is sometimes unable to repair a file system that crashed before completing a file system change. Even if the file system can be repaired, there is no guarantee its structure will be preserved: fsck might put files into the lost+found directory because it cannot determine where they belong.

Complete assurance of file system integrity following a crash (excepting disk failure). Since VxFS groups the steps involved in a file system operation, it never leaves a transaction only partially complete following a system failure. (A transaction pending at the moment the system crashed will either be completed entirely or "rolled backed" to its pre-transaction state.)

What do I do in the event of a disk failure?

The file system must be scanned from beginning to end for inconsistencies, with no assurances of file system integrity.

As with HFS, the file system must be scanned from beginning to end for inconsistencies, with no assurances of file system integrity.

 

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.