====== AIX: Create a dump on a LPAR ======
First force the parameter "always allow dump" to TRUE:
[root@labotest]/root# sysdumpdev -K
Check if a primary dump device is set to a logical volume, with type sysdump:
[root@labotest]/root# sysdumpdev -l
primary /dev/hd7
secondary /dev/sysdumpnull
copy directory /var/adm/ras
forced copy flag TRUE
always allow dump TRUE
dump compression ON
type of dump traditional
Check the size needed for dump device:
[root@labotest]/root# sysdumpdev -e
0453-041 Estimated dump size in bytes: 1763075686
Check if you have enough space in the primary dump device (hd7), else increase to the right size:
[root@labotest]/root# lslv hd7 | grep PP
MAX LPs: 512 PP SIZE: 32 megabyte(s)
LPs: 60 PPs: 60
STALE PPs: 0 BB POLICY: relocatable
INTRA-POLICY: middle UPPER BOUND: 32
Cleanup old snaps:
[root@labotest]/root# snap -r
On the HMC, perform a reboot, by selecting DUMP (default).
--> select the LPAR --> Operation --> Restart --> Dump
Or with HMC command line
hscroot@HMC:~> chsysstate -r lpar -o dumprestart -n sysh1 -m P-570
After the reboot you can check into errlog (errpt -a), to see if the dump is success, and/or
[root@labotest]/tmp/ibmsupt# sysdumpdev -L
0453-039
Device name: /dev/hd7
Major device number: 10
Minor device number: 16
Size: 1307898880 bytes
Uncompressed Size: 7879498205 bytes
Date/Time: Thu Aug 4 17:15:26 GMT+02:00 2011
Dump status: 0
Type of dump: traditional
dump completed successfully
Once the LPAR is running again, perform a snap:
[root@labotest]/root# snap -ac
This snap is located into /tmp/ibmsupt/snap.pax.Z
Now you are able to send it to IBM FTP site:
rename the snap file to /tmp/ibmsupt/.snap.pax.Z
(Example: 85885.500.624.snap.pax.Z)
upload the renamed file to IBM FTP server:
ftpserver: ftp.emea.ibm.com
user: anonymous
password: (your email address)
directory: /toibm/aix/
transfer mode: binary
===== Analyse system dump =====
A system dump indicates a severe problem with an AIX system. System dumps usually halt the system, necessitating a reboot.
[root@labotest]/root# errpt
67145A39 0413095315 U S SYSDUMP SYSTEM DUMP
To copy the core, be sure to have enough space !
Copy the system dump from dumpdevice to a file
[root@labotest]/root# savecore -f -d /tmp
Next, uncompress the dump using the dmpuncompress command, uncompress can be very big!
[root@labotest]/root# dmpuncompress vmcore.0.BZ
Lastly, format the dump:
[root@labotest]/root# /usr/lib/ras/dmprtns/dmpfmt -c vmcore.0
This dump appears complete - The end-of-dump component was found.
Lastly, format the dump:
[root@labotest]/root# /usr/lib/ras/dmprtns/dmpfmt -c vmcore.0
Analyse the dump:
[root@labotest]/root# kdb vmcore.0 vmunix.0
Like using the dbx
(0)> stat
SYSTEM_CONFIGURATION:
CHRP_SMP_PCI POWER_PC POWER_6 machine
with 2 available CPU(s) (64-bit registers)
SYSTEM STATUS:
sysname... AIX
nodename.. lpar_name
release... 1
version... 7
… lines omitted …
time of crash: Mon Apr
13 09:52:09 2015
age of system: 13 day,
18 hr., 37 min., 28 sec.
xmalloc debug: enabled
FRRs active... 0
FRRs started.. 0
CRASH INFORMATION:
CPU -1 CSA 053A7E80
at time of crash, error code for LEDs:
70000000
This output contains key details. It tells you when your system crashed, along with your AIX version and when the system was installed. It also gives you an LED code (70000000 in this example) that mirrors the LED on the outside of your p Systems box. 70000000 is a program interrupt.
Now for the most telling command in this initial dump run-through. From your kdb prompt, enter "status":
(0)> status
CPU INTR TID TSLOT PID
PSLOT PROC_NAME
0 15000D9 336 6A006A 106
sysdumpstart
1 1B0037 27 F001E 15
wait
2-3 Disabled
https://techchannel.com/SMB/02/2017/analyzing-aix-system-dumps