User Tools

Site Tools


aix:boot_error

Recovery from LED 552, 554, or 556 in AIX

Technote (FAQ)

Question

Recovery from LED 552, 554, or 556 in AIX Versions 4 and 5 Answer

This document discusses the known causes of LED 552, 554, and 556. Included is a procedure for recovery from these errors. This document applies to AIX Versions 4 and 5. Causes of an LED 552, 554, or 556 Recovery procedure Recommended fixes Causes of an LED 552, 554, or 556

An LED code of 552, 554, or 556 during a standard disk based boot indicates a failure occurred during the varyon of the rootvg volume group.

Some known causes of an LED 552, 554, or 556 are:

  a corrupted file system
  a corrupted Journaled File System (JFS) log device
  a bad IPL-device record or bad IPL-device magic number; the magic number indicates the device type
  a corrupted copy of the Object Data Manager (ODM) database on the boot logical volume
  a fixed disk (hard disk) in the inactive state in the root volume group 

Recovery procedure

To diagnose and fix the problem, boot to a Service mode shell and run the fsck command (file system check) on each file system. If the file system check fails, you may need to perform other steps.

WARNING: Do not use this document if the system is a /usr client, diskless client, or dataless client.

  Boot your system into a limited function maintenance shell (Service or Maintenance mode) from bootable AIX media to use this recovery procedure.
  Refer to your system user's or installation and service guide for specific IPL procedures related to your type and model of hardware. You can also refer to the document titled "Booting in Service Mode", available at http://techsupport.services.ibm.com/server/aix.srchBroker for more information.

Step 1

  • With bootable media of the same version and level as the system, boot the system into Service mode. The bootable media can be any ONE of the following:

Bootable CD-ROM

mksysb
Bootable Install Tape 

Follow the screen prompts to the Welcome to Base OS menu.

Step 2

  • Choose Start Maintenance Mode for System Recovery (Option 3). The next screen displays prompts for the Maintenance menu.
  • Choose Access a Root Volume Group (Option 1).
      The next screen displays a warning that indicates you will not be able to return to the Base OS menu without rebooting.
  • Choose 0 continue.
      The next screen displays information about all volume groups on the system.
  • Select the root volume group by number. The logical volumes in rootvg will be displayed with two options below.
  • Choose Access this volume group and start a shell before mounting the file systems (Option 2).
  If you receive errors from the preceding option, do not continue with the rest of this procedure. Correct the problem causing the error. If you need assistance correcting the problem causing the error, contact one of the following:
local branch office
your point of sale
your AIX support center 

Step 3

  • Run the following commands to check and repair file systems.
  fsck -p /dev/hd4 
  fsck -p /dev/hd2 
  fsck -p /dev/hd9var 
  fsck -p /dev/hd3
  fsck -p /dev/hd1 
  NOTE: The -y option gives the fsck command permission to repair file system corruption when necessary. This flag can be used to avoid having to manually answer multiple confirmation prompts, however, use of this flag can cause permanent data loss in some situations.
  • If any of the following conditions occur, proceed accordingly.

If fsck indicates that block 8 could not be read, the file system is probably unrecoverable. See step 5 for information on unrecoverable file systems.

If fsck indicates that a file system has an unknown log record type, or if fsck fails in the logredo process, then go to step 5.
If the file system checks were successful, skip to step 7. 
  The easiest way to fix an unrecoverable file system is to recreate it. This involves deleting it from the system and restoring it from a very current system backup. Note that hd9var and hd3 can be recreated, but hd4 and hd2 cannot be recreated. If hd4 and/or hd2 is unrecoverable, AIX must be reinstalled or restored from system backup. For assistance with unrecoverable file systems, contact your local branch office, point of sale, or AIX support center. Do not follow the rest of the steps in this document.

Step 4

  • A corruption of the JFS log logical volume has been detected. Use the logform command to reformat it.
       /usr/sbin/logform -V jfs2 /dev/hd8

or

       /usr/sbin/logform /dev/hd8

Answer yes when asked if you want to destroy the log.

  Repeat step 4 for all file systems that did not successfully complete fsck the first time. If step 4 fails a second time, the file system is almost always unrecoverable. See step 5 for an explanation of the options at this point. In most cases, step 4 will be successful. If step 4 is successful, continue to step 8.
  With the key in Normal position (for microchannel machines), run the following commands to reboot the system:
     exit
     sync;sync;sync;reboot
  As you reboot in Normal mode, notice how many times LED 551 appears. If LED 551 appears twice, fsck is probably failing because of a bad fshelper file. If this is the case and you are running AFS, see step 11.
  The majority of instances of LED 552, 554, and 556 will be resolved at this point. If you still have an LED 552, 554, or 556, you may try the following steps.
  ATTENTION: The following steps will overwrite your Object Data Manager (ODM) database files with a very primitive, minimal ODM database. Due to the potential loss of user configuration data caused by this procedure, it should only be used as a last resort effort to gain access to your system to attempt to back up any data that you can. It is NOT recommended to use the following procedure in lieu of restoring from a system backup.
  Repeat step 1 through step 3.
  Run the following commands, which remove much of the system's configuration and save it to a backup directory.
  mount /dev/hd4 /mnt
  mount /dev/hd2 /mnt/usr
  mkdir /mnt/etc/objrepos/bak
  cp /mnt/etc/objrepos/Cu* /mnt/etc/objrepos/bak
  cp /etc/objrepos/Cu* /mnt/etc/objrepos
  umount /dev/hd2
  umount /dev/hd4
  exit
  Determine which disk is the boot disk with the lslv command. The boot disk will be shown in the PV1 column of the lslv output.
lslv -m hd5
  Save the clean ODM database to the boot logical volume. (# is the number of the fixed disk, determined with the previous command.)
  savebase -d /dev/hdisk# 
  If you are running AFS, go to step 11; otherwise, go to step 12.
  If you are running the Andrew File System (AFS), use the following commands to find out whether you have more than one version of the v3fshelper file.
     cd /sbin/helpers
     ls -l v3fshelper*
  If you have only one version of the v3fshelper file (for example, v3fshelper), proceed to step 12.
  If there is a version of v3fshelper marked as original (for example, v3fshelper.orig), run the following commands:
     cp v3fshelper v3fshelper.afs
     cp v3fshelper.orig v3fshelper
  WARNING: Do not proceed further if the system is a /usr client, diskless client, or dataless client.
  Make sure that hd5 is on the edge of the drive and if it is more than 1 partition that the partitions are contiguous. For systems of 5.1 and above, make sure that hd5 is greater than 12 MB:
       lslv hd5 (Check to see what the PP Size: is equal to)
       lslv -m hd5
    LP    PP1  PV1           PP2   PV2                    PP3   PV3
  0001   0001 hdisk2
  0002   0002 hdisk2
  Recreate the boot image (hdisk# is the fixed disk determined in step 11):
# ln -s /mnt/usr/lib/boot/unix /unix
# chroot /mnt /bin/ksh
# /usr/sbin/bosboot -a -d /dev/hdisk1

bosboot: Boot image is 51228 512 byte blocks.
# bootlist -m normal -o
hdisk1 pathid=2
#   ipl_varyon -i
[S 2818120 2752516 03/20/19-15:00:06:458 ipl_varyon.c 1312] ipl_varyon -i
PVNAME          BOOT DEVICE     PVID                    VOLUME GROUP ID
hdisk1          YES             00f6d51b9a5ef20f0000000000000000        00f6d51b00004c00
[E 2818120 0:013 ipl_varyon.c 1453] ipl_varyon: exited with rc=0
  Make sure the bootlist is set correctly:
       bootlist -m normal -o
  Make changes, if necessary:
       bootlist -m normal hdiskX cdX
  (This can be edited to whatever you wish it to be.)
  NOTE: If you suspect an inactive or damaged disk device is causing the boot problem and the boot logical device, hd5, is mirrored onto another device, you may wish to list this other boot device first in the bootlist.
  Make sure that the disk drive that you have chosen as your bootable device has a yes next to it:
       ipl_varyon -i
  Example:
  PVNAME                     BOOT DEVICE       
  PVID                                 VOLUME GROUP ID
  hdisk1                     NO                
  0007b53cbfd04a9000000000000000000007b53c00004c00
  hdisk4                     NO                
  0007b53c1244625d00000000000000000007b53c00004c00
  hdisk2                     YES               
  0007b53c8ffd631200000000000000000007b53c00004c00
  From the above example, hdisk2 would be a bootable disk drive while hdisk1 and hdisk4 would not be.
  If you copied files in step 11, copy the AFS file system helper back to v3fshelper:
     cp v3fshelper.afs v3fshelper
  Turn the key to the Normal position if dealing with microchannel machine and run
     sync;sync;sync;reboot

If you followed all of the preceding steps and the system still stops at an LED 552, 554, or 556 during a reboot in Normal mode, you may want to consider reinstalling your system from a recent backup. Isolating the cause of the hang could be excessively time-consuming and may not be cost-effective in your operating environment. To isolate the possible cause of the hang, would require a debug boot of the system. Instructions for doing this are included in the document “Capturing Boot Debug”, available at http://www14.software.ibm.com/webapp/set2/sas/f/power/helpdatabase . It is still possible, in the end, that isolation of the problem may indicate a restore or reinstall of AIX is necessary to correct it.

Resolving LED code 551, 555, or 557

If your system hangs with LED code 555, it will most likely mean that one of your rootvg file systems is corrupt. The following link will provide information on how to resolve it:

http://www-304.ibm.com/support/docview.wss?uid=isg3T1000217

After completing the procedure, the system may still hang with LED code 555. If that happens, boot the system from media and enter service mode again, and access the volume group. Then check what the boot disk is according to:

  # lslv -m hd5 

Then also check your bootlist:

  # bootlist -m normal -o 

If these 2 don't match, set the boot list to the correct disk, as indicated by the lslv command above. For example, to set it to hdisk1, run:

  # bootlist -m normal hdisk1 

And then, make sure you can run the bosboot commands:

  # bosboot -ad /dev/hdisk1
  # bosboot -ad /dev/ipldevice

Note: exchange hdisk1 in the example above with the disk that was indicated by the lslv command.

If the bosboot on the ipldevice fails, you have 2 options: Recover the system from a mksysb image, or recreate hd5. First, create a copy of your ODM:

  # mount /dev/hd4 /mnt
  # mount /dev/hd2 /mnt/usr
  # mkdir /mnt/etc/objrepos/bak
  # cp /mnt/etc/objrepos/Cu* /mnt/etc/objrepos/bak
  # cp /etc/objrepos/Cu* /mnt/etc/objrepos
  # umount /dev/hd2
  # umount /dev/hd4
  # exit

Then, recreate hd5, for example, for hdisk1:

  # rmlv hd5
  # cd /dev
  # rm ipldevice
  # rm ipl_blv 
  # mklv -y hd5 -t boot -ae rootvg 1 hdisk1
  # ln /dev/rhd5 /dev/ipl_blv
  # ln /dev/rhdisk1 /dev/ipldevice
  # bosboot -ad /dev/hdisk1

If things still won't boot at this time, the only option you have left is to recover the system from a mksysb image.

aix/boot_error.txt · Last modified: 2021/01/01 21:21 (external edit)