The recfgct command is a script to reconfigure RSCT subsystems including reassigning node id that will basically reset all cluster configuration on a node to default. Resolving the problem
The “recfgct” command is a undocumented command that allows you to remove the node id from a RSCT node and basically wipe out any domain information that the node had on it. The location of the command is not typically in the normal path statement so you would most likely have to know where to find it to use it. Its default location is:
/usr/sbin/rsct/install/bin/
Recfgct is a shell script with two flags that can be passed to it:
[root@testsrv]/root# /usr/sbin/rsct/install/bin/recfgct -n #resets the node id file (default option) [root@testsrv]/root# /usr/sbin/rsct/install/bin/recfgct -s #saves the node id file.
Care must be taken when using this command as it will also reset the files /var/ct/cfg/ctrmc.acls and /var/ct/cfg/ctsec_map.global which are used for cluster security. TSAMP Support recommends backing up these files before using this command.
After alternate disk install cloning, improper hostname resolution may cause HMC RSCT errors.
Dynamic LPAR relies on Service Focal Point and RSCT. This means that TCP/IP, including the hostname on the HMC, must be configured. The /etc/hosts files in the HMC and all LPARs must have entries for all entities with the hostname appearing first in any list of aliases.
A problem with DLPAR can exist after using the altinst_rootvg cloning procedure to create the second LPAR. The file /etc/ct_node_id and /var/ct/cfg/ct_node_id had the same contents on both LPARs so RSCT may detect only one LPAR to manage. Development is aware of this and is working on a circumvention and correction. In the meantime, a workaround is to run the following two commands after the first boot of the new LPAR:
[root@testsrv]/root# /usr/sbin/rsct/install/bin/uncfgct -n [root@testsrv]/root# /usr/sbin/rsct/install/bin/cfgct
A subsequent reboot will cause the partitions to be synchronized from an RSCT perspective. The duplicate /etc/ct_node_id condition are likely to cause negative behaviors in a HACMP scenario between cloned LPARs. Cloning an LPAR with a mksysb tape/DVD will likely cause the same problem. NIM installs avoid this problem as a unique /etc/ct_node_id file is created.
Test the connection between LPAR and HMC:\
Depending on OS version:
[root@testsrv]/root# lsrsrc IBM.MCP Resource Persistent Attributes for IBM.MCP resource 1: MNName = "192.168.222.166" NodeID = 11973552969676986359 KeyToken = "fsm-pureflex" IPAddresses = {"10.10.3.130"} ConnectivityNames = {"192.168.222.166"} HMCName = "7955-01M*069582B" HMCIPAddr = "10.10.3.130" HMCAddIPs = "10.10.3.130" HMCAddIPv6s = "fdd8:ae8f:c01f:0:6eae:8bff:fe7c:43f2,fdb1:b77c:fa7e:0:6eae:8bff:fe7c:43f2,fe80::6eae:8bff:fe7c:43f2" ActivePeerDomain = "" NodeNameList = {"monitor3"}
[root@testsrv]/root# lsrsrc "IBM.ManagementServer" Resource Persistent Attributes for IBM.ManagementServer resource 1: Name = "10.10.0.3" Hostname = "10.10.0.3" ManagerType = "HMC" LocalHostname = "10.19.10.81" ClusterTM = "9078-160" ClusterSNum = "" ActivePeerDomain = "" NodeNameList = {"testsrv"} resource 2: Name = "10.9.0.3" Hostname = "10.9.0.3" ManagerType = "HMC" LocalHostname = "10.19.10.81" ClusterTM = "9078-160" ClusterSNum = "" ActivePeerDomain = "" NodeNameList = {"testsrv"}
For information, use -z to stop demons, and -A (to start and add for startup)
# /usr/sbin/rsct/bin/rmcctrl -z # /usr/sbin/rsct/bin/rmcctrl -A
Check also the RMC state on the HMC, it must be active
hscroot@luhmc1:~> lssyscfg -r lpar -m P55A-9133-55A-SN06C1B4G -F name,lpar_id,state,rmc_state,rmc_ipaddr,os_version,dlpar_mem_capable tsmaixbeta,11,Not Activated,inactive,,Unknown,0 Oracle1,10,Not Activated,inactive,192.168.222.158,AIX 7.1 7100-03-01-1341,0 monitor2,9,Running,active,192.168.222.157,AIX 7.1 7100-03-01-1341,1 repmon,7,Not Activated,inactive,,Unknown,0 monitor,6,Running,active,192.168.222.153,AIX 7.1 7100-03-04-1441,1 aixpowersc,5,Not Activated,inactive,192.168.222.155,AIX 7.1 7100-03-01-1341,0
Management Domain Status: Management Control Points
Indicates, in a management domain, that a communication problem has been discovered, and the RMC daemon has suspended communications with the RMC daemon that is on the specified node. This is typically the result of a configuration problem in the network, such that small heartbeat packets can be exchanged between the RMC daemon and the RMC daemon that is on the specified node, but larger data packets cannot. This is usually the result of a difference in MTU sizes in the network adapters of the nodes.
Action plan: add lines to the rmc script see below
Suggest reduce this RMC data packet to small size on the RMC startup script - /usr/sbin/rsct/bin/rmcd_start, by add “-S” option to limit rmc packet size.
On VIOS:
# cd /usr/sbin/rsct/bin/ cp rmcd_start rmcd_start.orig vi rmcd_start
( at the script end, find following line)
# now start the RMC daemon in the current process so it is the child of the # SRC subsystem daemon SOPT="-S 4500" <---- add this line
After saved rmcd_start script, recycle RMC,
# /usr/sbin/rsct/bin/rmcctrl -z # /usr/sbin/rsct/bin/rmcctrl -A # /usr/sbin/rsct/bin/rmcctrl -p