User Tools

Site Tools


aix:aix_boot_problem

AIX boot problems LED 5xx

Boot AIX in debug mode

from the opened console, when “(OK)> “ appears type the following :

boot –s verbose –s debug

Or into SMS menu select Restricted Open Firmware Prompt, at prompt:

BOOT_FROM_SEQ -s verbose

Recovery from LED 552, 554, or 556 in AIX

A LED code of 552, 554, or 556 during a standard disk based boot indicates a failure occurred during the varyon of the rootvg volume group.

Some known causes of an LED 552, 554, or 556 are:

  a corrupted file system
  a corrupted Journaled File System (JFS) log device
  a bad IPL-device record or bad IPL-device magic number; the magic number indicates the device type
  a corrupted copy of the Object Data Manager (ODM) database on the boot logical volume
  a fixed disk (hard disk) in the inactive state in the root volume group

Recovery procedure

To diagnose and fix the problem, boot to a Service mode shell and run the fsck command (file system check) on each file system. If the file system check fails, you may need to perform other steps.

WARNING: Do not use this document if the system is a /usr client, diskless client, or dataless client.

Step 1

  Boot your system into a limited function maintenance shell (Service or Maintenance mode) from bootable AIX media to use this recovery procedure.

Step 2

With bootable media of the same version and level as the system, boot the system into Service mode. The bootable media can be any ONE of the following:

Bootable CD-ROM
mksysb
Bootable Install Tape

Follow the screen prompts to the Welcome to Base OS menu.

Step 3

Choose Start Maintenance Mode for System Recovery (Option 3). The next screen displays prompts for the Maintenance menu.

  • Choose Access a Root Volume Group (Option 1).The next screen displays a warning that indicates you will not be able to return to the Base OS menu without rebooting.
  • Choose 0 continue. The next screen displays information about all volume groups on the system.
  • Select the root volume group by number. The logical volumes in rootvg will be displayed with two options below.
  • Choose Access this volume group and start a shell before mounting the file systems (Option 2).

If you receive errors from the preceding option, do not continue with the rest of this procedure. Correct the problem causing the error.

Step 4

Run the following commands to check and repair file systems.

fsck -p /dev/hd4 
fsck -p /dev/hd2 
fsck -p /dev/hd9var 
fsck -p /dev/hd3
fsck -p /dev/hd1 

NOTE: The -y option gives the fsck command permission to repair file system corruption when necessary. This flag can be used to avoid having to manually answer multiple confirmation prompts, however, use of this flag can cause permanent data loss in some situations.

If any of the following conditions occur, proceed accordingly.

  • If fsck indicates that block 8 could not be read, the file system is probably unrecoverable. See step 5 for information on unrecoverable file systems.
  • If fsck indicates that a file system has an unknown log record type, or if fsck fails in the logredo process, then go to step 6.
  • If the file system checks were successful, skip to step 8.

Step 5

  The easiest way to fix an unrecoverable file system is to recreate it. This involves deleting it from the system and restoring it from a very current system backup. Note that hd9var and hd3 can be recreated, but hd4 and hd2 cannot be recreated. If hd4 and/or hd2 is unrecoverable, AIX must be reinstalled or restored from system backup. For assistance with unrecoverable file systems, contact your local branch office, point of sale, or AIX support center. Do not follow the rest of the steps in this document.

Step 6

A corruption of the JFS2 log logical volume has been detected. Use the logform command to reformat it.

/usr/sbin/logform -V jfs2 /dev/hd8

Answer yes when asked if you want to destroy the log.

Step 7

Repeat step 4 for all file systems that did not successfully complete fsck the first time. If step 4 fails a second time, the file system is almost always unrecoverable. See step 5 for an explanation of the options at this point. In most cases, step 4 will be successful. If step 4 is successful, continue to step 8.

Step 8

Run the following commands to reboot the system:

exit
sync;sync;sync;reboot

As you reboot in Normal mode, notice how many times LED 551 appears. If LED 551 appears twice, fsck is probably failing because of a bad fshelper file. If this is the case and you are running AFS, see step 11.

  The majority of instances of LED 552, 554, and 556 will be resolved at this point. If you still have an LED 552, 554, or 556, you may try the following steps.
  ATTENTION: The following steps will overwrite your Object Data Manager (ODM) database files with a very primitive, minimal ODM database. Due to the potential loss of user configuration data caused by this procedure, it should only be used as a last resort effort to gain access to your system to attempt to back up any data that you can. It is NOT recommended to use the following procedure in lieu of restoring from a system backup.

Step 9

Repeat step 1 through step 3.

Step 10

Run the following commands, which remove much of the system's configuration and save it to a backup directory.

mount /dev/hd4 /mnt
mount /dev/hd2 /mnt/usr
cp /mnt/etc/objrepos/Cu* /mnt/etc/objrepos/bak
cp /etc/objrepos/Cu* /mnt/etc/objrepos
umount /dev/hd2
umount /dev/hd4
exit

Determine which disk is the boot disk with the lslv command. The boot disk will be shown in the PV1 column of the lslv output.

lslv -m hd5

Save the clean ODM database to the boot logical volume. (# is the number of the fixed disk, determined with the previous command.)

savebase -d /dev/hdisk# 

Step 11

skipped

Step 12

WARNING: Do not proceed further if the system is a /usr client, diskless client, or dataless client.

Make sure that hd5 is on the edge of the drive and if it is more than 1 partition that the partitions are contiguous. For systems of 5.1 and above, make sure that hd5 is greater than 12 MB:

         lslv hd5 (Check to see what the PP Size: is equal to)
         lslv -m hd5

      LP    PP1  PV1           PP2   PV2                    PP3   PV3
    0001   0001 hdisk2
    0002   0002 hdisk2

Recreate the boot image (hdisk# is the fixed disk determined in step 11):

# chroot /mnt /usr/bin/ksh
# ln -s /unix /usr/lib/boot/unix_64
# bosboot -a -d /dev/hdisk# 

Make sure the bootlist is set correctly:

bootlist -m normal -o

Make changes, if necessary:

bootlist -m normal hdiskX cdX

NOTE: If you suspect an inactive or damaged disk device is causing the boot problem and the boot logical device, hd5, is mirrored onto another device, you may wish to list this other boot device first in the bootlist.

Make sure that the disk drive that you have chosen as your bootable device has a yes next to it:

ipl_varyon -i

Example:

    PVNAME                     BOOT DEVICE       
    PVID                                 VOLUME GROUP ID
    hdisk1                     NO                
    0007b53cbfd04a9000000000000000000007b53c00004c00
    hdisk4                     NO                
    0007b53c1244625d00000000000000000007b53c00004c00
    hdisk2                     YES               
    0007b53c8ffd631200000000000000000007b53c00004c00

From the above example, hdisk2 would be a bootable disk drive while hdisk1 and hdisk4 would not be.

Step 13

skipped

Step 14

Run

sync;sync;sync;reboot

If you followed all of the preceding steps and the system still stops at an LED 552, 554, or 556 during a reboot in Normal mode, you may want to consider reinstalling your system from a recent backup.

Recovery from LED 553 in AIX

An LED value of 553 is a checkpoint code displayed to indicate the system transition to phase 3 of IPL. A halt or hang at LED 553 is

  • often the result of a corrupted or missing /etc/inittab file.
  • It can also be caused by full / (root) or /tmp file systems
  • inconsistencies in either startup configuration files
  • Object Data Manager (ODM) object class databases, or system library files.
  • Additionally, a number of other issues involving file permissions, invalid hard links in the root file system, etc. have been observed to cause a hang at LED 553.

Use the previous steps from recovery procedure, and follow with:

Step 7

Type exit to exit from the shell. The file systems should automatically mount after you type exit. If you receive error messages at this point, reboot into a limited function maintenance shell again to attempt to address the failure causes.

Use the df command to check for free space in /dev/hd3 and /dev/hd4.

df  /dev/hd3
df  /dev/hd4

If the output from the df command shows that either file system is out of space, erase some files from that file system. Three files you may want to erase are /smit.log, /smit.script and /.sh_history.

Next, check the /etc/inittab file for corruption. It may be empty or missing, or it may have an incorrect entry. For comparison, see the section “Sample /etc/inittab file” at the end of this document.

Follow IBM links for additional troubles.

IBM links http://www-01.ibm.com/support/docview.wss?uid=isg3T1000132

https://www-304.ibm.com/support/docview.wss?uid=isg3T1000133

Performing the Debug Boot

Now that you have enabled logging of the console, you need to perform the debug boot.

  Boot the client in System Management Services (SMS) mode.
      For standalone systems, this is done by powering the system on and waiting for the following to be displayed:
       IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM

           1 = SMS Menu                   5 = Default Boot List
           8 = Open Firmware Prompt       6 = Stored Boot List

           Memory      Keyboard     Network     SCSI     Speaker
      Note that there will be a slight pause of one or two seconds after the word "Keyboard" is displayed and before "Speaker" is displayed. You will also hear a set of tones if you are near the system. This is the point where you need to press the 1 key to enter SMS. If you hear another set of tones, or see the word "Keyboard" appear prior to pressing the 1 key, then you have missed the window to enter SMS and you will have to power the system off and on to try again.
      Note: On some systems with a graphics console, you may need to press the F1 key instead of the 1 key. Press the key that is displayed indicating SMS.
      For IVM managed systems, you will activate the LPAR from the IVM instead of powering the system on. Otherwise, the procedure is the same, press the 1 key on the client's console after the word "Keyboard" is displayed, but before "Speaker" displays.
      For HMC managed systems, you may also use the above procedure, but there is also a way to activate the LPAR in SMS mode:
          If the LPAR is not in the "Not Activated" mode, perform a Shutdown operation on the LPAR.
          Select the client LPAR by checking the box next to it in the HMC GUI.
          Select the Operations --> Activate --> Profile option from the Tasks menu.
          In the "Activate Logical Partition" dialog, choose the correct "Logical Partition profile", then click on the "Advanced..." button. Do not check the "Open a terminal window or console session" box, since we already have a console open.
          In the "Activate Logical Partition - Advanced" dialog, choose "SMS" from the "Boot mode" menu. Click OK.
          Finally, Click OK to activate the LPAR. It will automatically enter SMS.
      If you were successful, you will see a menu similar to this in the window which is now being logged:
         Version AL730_149
        SMS 1.7 (c) Copyright IBM Corp. 2000,2008 All rights reserved.
        ----------------------------------------------------------
        Main Menu
        1.   Select Language
        2.   Setup Remote IPL (Initial Program Load)
        3.   Change SCSI Settings
        4.   Select Console
        5.   Select Boot Options

        ----------------------------------------------------------
        Navigation Keys:
                X = eXit System Management Services
        ----------------------------------------------------------
         Type menu item number and press Enter or select Navigation key:
  Next, you will verify that the network information that SMS will use to perform the network boot is correct.

At the SMS Main Menu, enter the hidden menu option, 0 and press Enter and confirm that you want to exit SMS. This will drop you into Open Firmware.

0 > boot -s verbose
The debug boot will begin and be displayed to the console.
aix/aix_boot_problem.txt · Last modified: 2024/03/29 21:26 by manu