The Network Installation Manager server is one of the most important host in an environment. New machines installations, machines backups, backups restorations,software (filesets), third party products installations, in some cases volume group backups are made from the NIM server. Some best practices have to be respected. I’ll give you in this post a few tricks for NIM. First off all a NIM server has to be in your disaster recovery plan because it the first server needed when you have to re-build a crashed machine : my solution HANIM. It has to be secured (nimsh, and nimsh authentication over ssl), and it has to be flexible and automated (DSM). NIM High Availability : HANIM
Finding documentation and information about NIM High Availability is not so easy. I recommend you to check the NIM from a to Z Redbook, it’s one of the only viable source for HANIM. HANIM simple to setup and simple to use, but there are a few things to know and to understand about it : HANIM Overview
The alternate NIM master is a backup NIM build from the NIM master. Takeover operations from master to alternate are manuals. PowerHA can be used to run these takeover operations but my advice is not to use it. Takeover can be performed even if the NIM master is down. HANIM does not perform any heartbeat. HANIM only provides a method for replicating NIM database and resources. Resources can be replicated from master to alternate : NIM database AND resources data can be replicated (replicate=yes option). My advice is to run every NIM operation from the master (even if it is possible to run a NIM operation from the alternate). Disks are not shared between the master and the alternate, when a sync operation is done, missing resources are copied over NFS form the master to the alternate, or from the alternate to the master. HANIM does not provides a filesystem takeover. A takeover operation modify all the nimclient’s /etc/niminfo files. The NIM_MASTER_HOSTNAME_LIST is modified by the takeover operation and the alternate NIM master is moved in first position. The NIM_MASTER_HOSTNAME is modified with the alternated NIM master hostname.
Initial setup
On the NIM master and on the alternate NIM master some filesets have to be installed, check the presence of : bos.sysmgt.nim.master, bos.sysmgt.nim.spot, bos.sysmgt.nim.client. NIM master and alternate NIM master must be one the same AIX version :
# lslpp -l | grep -i nim bos.sysmgt.nim.client 7.1.2.15 COMMITTED Network Install Manager - bos.sysmgt.nim.master 7.1.2.15 COMMITTED Network Install Manager - bos.sysmgt.nim.spot 7.1.2.15 COMMITTED Network Install Manager - SPOT bos.sysmgt.nim.client 7.1.2.15 COMMITTED Network Install Manager - # oslevel -s 7100-02-02-1316
Configure the NIM master
Initialize the NIM master with the nimconfig command, you’ll need to name the first network used by NIM. nimesis daemons will be started at this step.
# nimconfig -a pif_name=en0 -a netname=10-10-20-0-s24-net -a master_port=1058 -a verbose=3 -a cable_type=N/A [..] Checking input attributes. attr_ass: 'cpuid' => '00F359164D00' 'pif_name' => 'en0' 'netname' => '10-10-20-0-s24-net' 'master_port' => '1058' 'cable_type' => 'N/A' 'net_addr' => '10.10.20.1' 'snm' => '255.255.255.0' 'adpt_addr' => '667C70F7A904' 'adpt_name' => 'ent0' Making sure the NIM Master package is OK. set_state: id=1361463886; name=; state_attr=85; new_state=5; checking the object definition of ; checking interface info for master; Built NIM infomation file. 10.10.20.1 is known as nim_master Adding default route 10.10.20.254 to network object 0 - /usr/lpp/bos.sysmgt/nim/methods/m_mknet 1 - -anet_addr=10.10.20.1 2 - -asnm=255.255.255.0 3 - -tent 4 - -arouting1=default 10.10.20.254 5 - 10-10-20-0-s24-net Connecting NIM master to master network. 0 - /usr/lpp/bos.sysmgt/nim/methods/m_chmaster 1 - -aif1=10-10-20-0-s24-net nim_master 667C70F7A904 2 - -amaster_port=1058 3 - -aregistration_port=1059 4 - -acable_type1=N/A 5 - master Adding NIM deamons to SRC and starting.... 0513-071 The nimesis Subsystem has been added. 0513-071 The nimd Subsystem has been added. 0513-059 The nimesis Subsystem has been started. Subsystem PID is 9568296. [..]
NIM resources such as spot, lpp_source and so on can be created right now, please refer to the NIM cheatsheet by chmod666.org . For the purpose of this post some resources (spot, lpp_source, mksysb, network) are created, these ones will be replicated later.
Configure the alternate NIM master
NIM alternate master is configured with the niminit command. If you check on the NIM from a to Z, page 124, a note is warning you about the synchronization : “At the time of writing, only rsh/rshd communication is supported for NIM synchronization.”.THIS STATEMENT IS FALSE : I’m using nimsh for the synchronization, and I recommend to use it. We are in 2013, do not use rsh anymore.
# niminit -a is_alternate=yes -a master=nim_master -a pif_name=en0 -a cable_type1=N/A -a connect=nimsh -a name=nim_alternate 0513-071 The nimesis Subsystem has been added. 0513-071 The nimd Subsystem has been added. 0513-059 The nimesis Subsystem has been started. Subsystem PID is 10944522. nimsh:2:wait:/usr/bin/startsrc -g nimclient >/dev/console 2>&1 0513-044 The nimsh Subsystem was requested to stop. 0513-059 The nimsh Subsystem has been started. Subsystem PID is 5963998.
Verification
You’re done with the configuration, you can now start to synchronize, replicate and takeover… pretty easy. Here are some points you can verify :
On the NIM master, the attribute is_alternate is set to yes :
# lsnim -l master [..] is_alternate = yes [..]
On the NIM master, a new machine object typed alternate_master is created :
# lsnim -t alternate_master nim_alternate machines alternate_master
After the first database synchronization, on the alternate NIM master, a new machine object typed alternate_master is created, this the NIM master :
# lsnim -t alternate_master nim_master machines alternate_master
On the alternate NIM master, the attribute is_alternate does not exists :
# lsnim -l master | grep alternate
Synchronization and replication
NIM master and alternate NIM master can now communicate with each others, some resources are created on the master, and it’s now time to synchronize. Remember : HANIM only provides a method for replicating NIM database and resources. You can -if you want- synchronize the NIM database only or the NIM database and its resources (data included). Remember : never perform a NIM synchronization from the alternate NIM master. Database synchronization only
The database synchronization is useful, when objects are modified, for example when you are modifying a subnet mask for a network object. It also can be useful when objects “without files” are created ; for instance a machine. On the other hand if your are trying to synchronize the database if an object “with a file” exists such as an lpp_source, a spot, or an fb_script, this one will not be created, you have to copy the file before synchronize, or use the replicate attribute :
On NIM master two objects are created, an fb_script and a machine:
# nim -o define -t fb_script -a server=master -a location=/export/nim/others/postinstall/fb_script.ksh fb_script01 # ls -l /export/nim/others/postinstall/fb_script.ksh -rw-r--r-- 1 root system 35 Mar 8 18:01 /export/nim/others/postinstall/fb_script.ksh # lsnim ruby ruby machines standalone
A database synchronization is performed :
# nim -o sync -a force nim_alternate [..] The level of the NIM master fileset on this machine is: 7.1.2.15 The level of the NIM database backup is: 7.1.2.15 [..] Checking NIM resources Removing fb_script01 0518-307 odmdelete: 1 objects deleted. from nim_attr (serves attr) 0518-307 odmdelete: 0 objects deleted. from nim_attr (group memberships) 0518-307 odmdelete: 5 objects deleted. from nim_attr (resource attributes) 0518-307 odmdelete: 1 objects deleted. from nim_object (resource object) Finished removing fb_script01
On the alternate NIM master, the machine object is here but the fb_script was not replicated because the file was not present on the alternate NIM master :
# lsnim ruby ruby machines standalone # lsnim fb_script01 0042-053 lsnim: there is no NIM object named "fb_script01"
If you copy the file before synchronize the resource will be created :
master# scp fb_script.ksh nim_alternate:/export/nim/others/postinstall fb_script.ksh 100% 35 0.0KB/s 00:00 master# nim -o sync -a force nim_alternate [..] Restoring the NIM database from /tmp/_nim_dir_13041674/mnt0 x ./etc/NIM.level, 9 bytes, 1 tape blocks [..] Keeping fb_script01 alternate# # lsnim fb_script01 fb_script01 resources fb_script
Synchronization with replication
I encourage you not to use the database synchronization, but to use it with replication, it does the same job but copy the files for you. Much much easier, just add replicate=yes attribute to the nim command, it works like a charm :
# lsnim -q sync alternate_master the following attributes are optional: -a verbose= -a replicate= -a reset_clients= # nim -o sync -a force=yes -a replicate=yes alternate_master
Takeover
If the NIM master is down a takeover operation allows the alternate NIM master to become NIM master for the clients. On clients /etc/niminfo file is modified (NIM_MASTER_HOSTNAME and NIM_MASTER_HOSTNAME_LIST attributes are modified). /etc/niminfo and lsnim output file before a takeover operation :
client# grep -E "NIM_MASTER_HOSTNAME_LIST|NIM_MASTER_HOSTNAME" /etc/niminfo export NIM_MASTER_HOSTNAME=nim_master export NIM_MASTER_HOSTNAME_LIST="nim_master nim_alternate" master# lsnim -l client | grep current_master current_master = nim_master
Takeover operation is initiated from the alternate NIM master :
alternate# nim -o takeover -a show_progress=yes nim_master +-----------------------------------------------------------------------------+ Performing "reset" Operation +-----------------------------------------------------------------------------+ +-----------------------------------------------------------------------------+ "reset" Operation Summary +-----------------------------------------------------------------------------+ Target Result ------ ------ client RESET client1 RESET [..] +-----------------------------------------------------------------------------+ Initiating "takeover" Operation +-----------------------------------------------------------------------------+ Initiating the takeover operation on machine 1 of 240: client ...
Initiating the takeover operation on machine 2 of 240: client1... [..] +-----------------------------------------------------------------------------+ "takeover" Operation Summary +-----------------------------------------------------------------------------+ Target Result ------ ------ client SUCCESS client1 SUCCESS [..] alternate# lsnim -l client | grep current_master current_master = nim_alternate client# grep -E "NIM_MASTER_HOSTNAME_LIST|NIM_MASTER_HOSTNAME" /etc/niminfo export NIM_MASTER_HOSTNAME=nim_alternate export NIM_MASTER_HOSTNAME_LIST="nim_alternate nim_master"
When the NIM master is up, initiate the takeover for the master :
# nim -o takeover -a show_progress=yes nim_alternate
Synchronization automation and other files ?
I recommend to run a NIM synchronization every day, I personally have a cronjob doing it every day at eleven PM. Most of the time a NIM synchronization is not enough and you’ll need to synchronize others file in my case, my root .profile my etc/hosts file, in your case whatever you want. For this need I’m using a little script based over rsync which synchronize my master to my alternate everyday :
# crontab -l [..] 0 23 * * * /export/nim/others/tools/do_sync.ksh >/dev/null 2>&1 [..] # cat /export/nim/others/tools/do_sync.ksh [..] nim -o sync -a force=yes -a replicate=yes -a reset_clients=yes ${alternate} /export/nim/others/tools/sync_to_alternate.ksh [..] # cat /export/nim/others/tools/sync_to_alternate.ksh [..] /usr/bin/rsync -ave ssh ${a_filesystem} ${alternate_nim_master}:${a_filesystem} [..]
NIM Security, use nimsh and use it over SSL nimsh over ssl NIM Master configuration form nimsh over SSL
From the NIM master enable the SSL support trough the nimconfig command, certificates will be generated in /ssl_nimsh/keys, OpenSSL fileset has to be installed : Check OpenSSL filesets :
# lslpp -l | grep openssl openssl.base 0.9.8.2400 COMMITTED Open Secure Socket Layer openssl.license 0.9.8.2400 COMMITTED Open Secure Socket License openssl.man.en_US 0.9.8.2400 COMMITTED Open Secure Socket Layer openssl.base 0.9.8.2400 COMMITTED Open Secure Socket Layer
Use nimconfig to enable SSL support :
# nimconfig -c 0513-029 The tftpd Subsystem is already active. Multiple instances are not supported. NIM_MASTER_HOSTNAME=nim_master x - /usr/lib/libssl.so.0.9.8 x - /usr/lib/libcrypto.so.0.9.8 Target "all" is up to date. Generating a 1024 bit RSA private key ......++++++ .++++++ writing new private key to '/ssl_nimsh/keys/rootkey.pem' ----- Signature ok subject=/C=US/ST=Texas/L=Austin/O=ibm.com/CN=Root CA Getting Private key Generating a 1024 bit RSA private key ...............++++++ .......++++++ writing new private key to '/ssl_nimsh/keys/clientkey.pem' ----- Signature ok subject=/C=US/ST=Texas/L=Austin/O=ibm.com Getting CA Private Key Generating a 1024 bit RSA private key ......++++++ .............++++++ writing new private key to '/ssl_nimsh/keys/serverkey.pem' ----- Signature ok subject=/C=US/ST=Texas/L=Austin/O=ibm.com Getting CA Private Key Check the NIM master : attribute ssl_support is now set to yes : # lsnim -l master | grep ssl_support ssl_support = yes
NIM alternate master for nimsh over SSL
If you’re using an alternate NIM master repeat the same operation (OpenSSL and nimconfig -r). Alternate NIM master is also a client of the NIM master, its client has to be configured :
# nimclient -c x - /usr/lib/libssl.so.0.9.8 x - /usr/lib/libcrypto.so.0.9.8 Received 2763 Bytes in 0.0 Seconds 0513-044 The nimsh Subsystem was requested to stop. 0513-077 Subsystem has been changed. 0513-059 The nimsh Subsystem has been started. Subsystem PID is 9502954.
Client configuration
Configure all nimclients to use ssl crypted authentication, if you are using alternate NIM master do not forget to download alternate certificates on clients :
# rmitab nimsh 2>/dev/null # rm -rf /etc/niminfo # niminit -aname=$(hostname) -a master=nim_master -a master_port=1058 -a registration_port=1059 -a connect=nimsh # nimclient -c # nimclient -o get_cert -a master_name=nim_alternate # stopsrc -s nimsh # startsrc -s nimsh
On the NIM server itself client’s connect attribute is now set to “nimsh (secure)” :
# lsnim -l ruby | grep connect connect = nimsh (secure)
http://chmod666.org/index.php/nim-less-known-features-hanim-nimsh-over-ssl-dsm/