====== GPFS operations ====== ===== Add a new disk to a filesystem or create a new filesystem ===== ==== Create NSD (Network Shared Disk) disks (format on GPFS format) ==== Before adding a disk to a GPFS filesystem, you have to give him som parameters: * name : logical name corresponding to the filesystem, This name can contain only the following characters: 'A' through 'Z', 'a' through 'z', '0' through '9', or '_' (the underscore). All other characters are not valid. * failuregroup : important setting, Maximum 3 different ID per filesystem, it correspond to copies of data. If no copy is required, then only one failure group...Use for example 2 for the first copy and 3 for the second copy. mmlsdisk will give you the currents failuregroup IDs * usage : use always dataAndMetadata, except if you want to make tuning, so you could put metadata on high speed disks * pool : always use system (except for advanced users) === identify the SAN disks === In non multipathing, ex VMware RDM you can use [root@gpfs01 ~]# lsscsi -s [0:2:0:0] disk IBM ServeRAID M1115 2.13 /dev/sda 298GB [0:2:1:0] disk IBM ServeRAID M1115 2.13 /dev/sdb 198GB [1:0:0:0] cd/dvd IBM SATA DEVICE 81Y3676 IBD1 /dev/sr0 - [7:0:0:0] disk IBM 2145 0000 /dev/sdii 274GB [7:0:0:1] disk IBM 2145 0000 /dev/sdc 21.4GB [7:0:0:2] disk IBM 2145 0000 /dev/sdd 1.09TB ... List UUID serial corresponding to [root@gpfs01 ~]# ll /dev/disk/by-id/ ... 0 lrwxrwxrwx 1 root root 10 Sep 30 01:30 dm-uuid-mpath-36005076300810163a00000000000006a -> ../../dm-4 0 lrwxrwxrwx 1 root root 10 Sep 30 01:30 dm-uuid-mpath-36005076300810163a00000000000006b -> ../../dm-5 ... 0 lrwxrwxrwx 1 root root 10 Sep 30 15:05 wwn-0x60050764008181c46800000000000058 -> ../../sdml 0 lrwxrwxrwx 1 root root 10 Sep 30 15:05 wwn-0x60050764008181c46800000000000059 -> ../../sdmm For multipathing devices use [root@gpfs01 ~]# multipath -ll | egrep "mpath|size" | paste -d " " - - mpathcu (360050764008181c46800000000000042) dm-126 IBM ,2145 size=256G features='1 queue_if_no_path' hwhandler='0' wp=rw mpathbp (360050764008181c46800000000000030) dm-23 IBM ,2145 size=1.0T features='1 queue_if_no_path' hwhandler='0' wp=rw ... Give the right device name after rescan, identify your disk and device name, use **dm-xx**: [root@gpfs01 scripts]# rescan-scsi-bus.sh -a [root@gpfs01 ~]# multipath -ll | egrep "mpath|size" | paste -d " " - - ... mpathbd (360050764008181c46800000000000023) dm-47 IBM ,2145 size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw POOL: system (default) === List failure group === Check FailureGroup : A is 2 and B is 3 [root@gpfs01 ~]# mmlsdisk gpfs01 disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------ GPFS_NSD_DATA_B_08 nsd 512 3 Yes Yes ready up system ... GPFS_NSD_DATA_A_13 nsd 512 2 Yes Yes ready up system === identify NSD in use and free disks (new disks) === NSD in use [root@gpfs01 ~]# mmlsnsd -X | grep gpfs01-hb | awk '{print $3}' | sort /dev/dm-10 /dev/dm-11 /dev/dm-12 /dev/dm-13 ... List all disks [root@gpfs01 ~]# multipath -ll | egrep "mpath|size" | paste -d " " - - | tr ' ' '\n' | grep 'dm-' | sed 's/^/\/dev\//' | sort /dev/dm-50 /dev/dm-51 ... **Difference** multipath -ll | egrep "mpath|size" | paste -d " " - - | tr ' ' '\n' | grep 'dm-' | sed 's/^/\/dev\//' | sort > /tmp/disk_all.txt mmlsnsd -X | grep gpfs01-hb | awk '{print $3}' | sort > /tmp/disk_nsd.txt sdiff -sw100 /tmp/disk_all.txt /tmp/disk_nsd.txt === Build NSD file === Create a text file containing a list of NSD disks, to add, and their characteristics. [root@gpfs01 scripts]# cat list.disks_CESSHARE.txt %nsd: device=/dev/dm-47 nsd=GPFS_NSD_CESSHARE_A_01 servers=gpfs01-hb,gpfs02-hb usage=dataAndMetadata failureGroup=2 pool=system Create the NSD (network shared disk), and also verify the disk [root@gpfs01 ~]# mmcrnsd -F list.disks_CESSHARE.txt -v yes mmcrnsd: Processing disk dm-8 mmcrnsd: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. Now disk is formated, but free, not attached to a filesystem [root@gpfs01 ~]# fdisk -l /dev/dm-47 ... Disk /dev/dm-47: 21.5 GB, 21474836480 bytes, 41943040 sectors ... Disk label type: gpt Disk identifier: 7A94FA63-6A6C-4001-89E8-E36D00B3F66E # Start End Size Type Name 1 48 41942991 20G IBM General Par GPFS: [root@gpfs01 ~]# mmlsnsd -L File system Disk name NSD volume ID NSD servers --------------------------------------------------------------------------------------------- cesshared01lv GPFS_NSD_CESSHARE01 0A0113A15B0BFD87 gpfs01-hb,gpfs02-hb ... (free disk) GPFS_NSD_CESSHARE_A_01 0A0113A15E2EB417 gpfs01-hb,gpfs02-hb [root@gpfs01 ~]# mmlsnsd -X Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- GPFS_NSD_CESSHARE01 0A0113A15B0BFD87 /dev/dm-9 dmm gpfs01-hb server node GPFS_NSD_CESSHARE01 0A0113A15B0BFD87 /dev/dm-2 dmm gpfs02-hb server node GPFS_NSD_CESSHARE_A_01 0A0113A15E2EB417 /dev/dm-47 dmm gpfs01-hb server node GPFS_NSD_CESSHARE_A_01 0A0113A15E2EB417 /dev/dm-47 dmm gpfs02-hb server node ..... ==== Add the NSD disk to a filesystem ==== === Add NSD to a current filesystem === First list unused disks [root@gpfs01 ~]# mmlsnsd -F File system Disk name NSD servers --------------------------------------------------------------------------- (free disk) GPFS_NSD_CESSHARE_A_01 gpfs01-hb,gpfs02-hb Create a stanza file like for NSD [root@gpfs01 scripts]# cat list.disks_CESSHARE.txt %nsd: device=/dev/dm-47 nsd=GPFS_NSD_CESSHARE_A_01 servers=gpfs01-hb,gpfs02-hb usage=dataAndMetadata failureGroup=2 pool=system Now add your disk to the filesystem, and rebalance blocs (if 2 copies of data are required, the a second copy will be done) [root@gpfs01 ~]# mmadddisk /dev/cesshared01lv -F list.disks_CESSHARE.txt -r The following disks of cesshared01lv will be formatted on node gpfs02: GPFS_NSD_CESSHARE_A_01: size 20480 MB Extending Allocation Map Checking Allocation Map for storage pool system Completed adding disks to file system cesshared01lv. mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. Restriping /dev/cesshared01lv ... Scanning file system metadata, phase 1 ... 100 % complete on Mon Jan 27 11:28:13 2020 Scan completed successfully. Scanning file system metadata, phase 2 ... 100 % complete on Mon Jan 27 11:28:13 2020 Scan completed successfully. Scanning file system metadata, phase 3 ... 100 % complete on Mon Jan 27 11:28:13 2020 Scan completed successfully. Scanning file system metadata, phase 4 ... 100 % complete on Mon Jan 27 11:28:13 2020 Scan completed successfully. Scanning user file metadata ... 100.00 % complete on Mon Jan 27 11:28:13 2020 ( 65792 inodes with total 404 MB data processed) Scan completed successfully. Done Check number of copy of a file --> only 1 copy of data and metadata ! We will a a second copy using **mmchfs** command, and then restripe to copy the data and metadat on second failuregroup disks, and the second step to optimize data placement: [root@gpfs01 connections]# mmlsattr /CESshared/ha/nfs/ganesha/gpfs-epoch replication factors metadata(max) data(max) file [flags] ------------- --------- --------------- 1 ( 2) 1 ( 2) /CESshared/ha/nfs/ganesha/gpfs-epoch [root@gpfs01 connections]# mmchfs cesshared01lv -m 2 -r 2 [root@gpfs01 connections]# mmrestripefs cesshared01lv -R Scanning file system metadata, phase 1 ... 100 % complete on Mon Jan 27 12:50:45 2020 Scan completed successfully. ... 100.00 % complete on Mon Jan 27 12:50:46 2020 ( 65792 inodes with total 808 MB data processed) Scan completed successfully. [root@gpfs01 connections]# mmlsattr /CESshared/ha/nfs/ganesha/gpfs-epoch replication factors metadata(max) data(max) file [flags] ------------- --------- --------------- 2 ( 2) 2 ( 2) /CESshared/ha/nfs/ganesha/gpfs-epoch [unbalanced] [root@gpfs01 connections]# Optimize data placement [root@gpfs01 connections]# mmrestripefs cesshared01lv -b Scanning file system metadata, phase 1 ... 100 % complete on Mon Jan 27 12:51:56 2020 Scan completed successfully. ... 100.00 % complete on Mon Jan 27 12:51:57 2020 ( 65792 inodes with total 808 MB data processed) Scan completed successfully. [root@gpfs01 connections]# mmlsattr /CESshared/ha/nfs/ganesha/gpfs-epoch replication factors metadata(max) data(max) file [flags] ------------- --------- --------------- 2 ( 2) 2 ( 2) /CESshared/ha/nfs/ganesha/gpfs-epoch === Add NSD to a new filesystem === This a an example of creation of a filesystem with the previously defined NSD disk, bock size 512K, 2 copies of data and metadata, quota is enable, [root@gpfs01 connections]# mmcrfs cesshared01lv -F list.disks_CESSHARE.txt -B 512K -m 2 -r 2 -Q yes -T /CESshared -v yes -D nfs4 -k nfs4 -A yes [root@gpfs01 connections]# mmmount cesshared01lv ===== Remove a disk ===== To delete GPFS_NSD_DATA01 from file system gpfs01 and rebalance the files across the remaining disks, issue this command: [root@gpfs01 ~]# mmlsdisk gpfs01 disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------ GPFS_NSD_DATA01 nsd 512 2 Yes Yes ready up system GPFS_NSD_DATA02 nsd 512 2 Yes Yes ready up system GPFS_NSD_DATA03 nsd 512 2 Yes Yes ready up system GPFS_NSD_DATA04 nsd 512 2 Yes Yes ready up system GPFS_NSD_DATA05 nsd 512 2 Yes Yes ready up system GPFS_NSD_DATA06 nsd 512 2 Yes Yes ready up system GPFS_NSD_DATA07 nsd 512 2 Yes Yes ready up system [root@gpfs01 ~]# mmdeldisk gpfs01 GPFS_NSD_DATA01 Now you are able to delete NSD GPFS_NSD_DATA01 from the GPFS cluster, first check if disk is free, then issue this command: [root@gpfs01 scripts]# mmlsnsd -F File system Disk name NSD servers --------------------------------------------------------------------------- (free disk) GPFS_NSD_DATA01 gpfs01-hb,gpfs02-hb [root@gpfs01 ~]# mmdelnsd GPFS_NSD_DATA01 ==== Remove a node from GPFS cluster: ==== * Remove disks that belong to the server you want to remove [root@labo_2_new]/root# mmlsnsd -m Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- diskh1 AC131C344EE5D6E8 /dev/hdisk2 labo_1 server node diskh1 AC131C344EE5D6E8 /dev/hdisk2 labo_2 server node diskh2 AC131C344EE600F2 /dev/hdisk4 labo_1 server node diskh2 AC131C344EE600F2 /dev/hdisk4 labo_2 server node diskk1 AC131C364EE5D6EC /dev/descgpfs1lv labo_s server node diskk2 AC131C364EE600F6 /dev/descgpfs2lv labo_s server node diskr1 AC131C344EE5D6EA /dev/hdisk3 labo_1 server node diskr1 AC131C344EE5D6EA /dev/hdisk3 labo_2 server node diskr2 AC131C344EE600F4 /dev/hdisk5 labo_1 server node diskr2 AC131C344EE600F4 /dev/hdisk5 labo_2 server node [root@labo_2_new]/root# mmlspv [root@labo_2_new]/root# mmlsdisk orafs1 disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ------- -------- ----- ------------- ------------ ------------ diskh1 nsd 512 1 yes yes ready up system diskr1 nsd 512 2 yes yes ready up system diskk1 nsd 512 3 no no ready up system [root@labo_2_new]/root# mmdeldisk orafs1 diskk1 Deleting disks ... Scanning system storage pool Scanning file system metadata, phase 1 ... Scan completed successfully. Scanning file system metadata, phase 2 ... Scan completed successfully. Scanning file system metadata, phase 3 ... Scan completed successfully. Scanning file system metadata, phase 4 ... Scan completed successfully. Scanning user file metadata ... 100.00 % complete on Fri Jan 6 09:52:41 2012 Scan completed successfully. Checking Allocation Map for storage pool 'system' tsdeldisk completed. mmdeldisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root@labo_2_new]/root# mmlsdisk orafs1 disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ------- -------- ----- ------------- ------------ ------------ diskh1 nsd 512 1 yes yes ready up system diskr1 nsd 512 2 yes yes ready up system [root@labo_2_new]/root# mmlsdisk orafs2 disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ------- -------- ----- ------------- ------------ ------------ diskh2 nsd 512 4 yes yes ready up system diskr2 nsd 512 5 yes yes ready up system diskk2 nsd 512 6 no no ready up system [root@labo_2_new]/root# mmdeldisk orafs2 diskk2 Deleting disks ... Scanning system storage pool Scanning file system metadata, phase 1 ... Scan completed successfully. Scanning file system metadata, phase 2 ... Scan completed successfully. Scanning file system metadata, phase 3 ... Scan completed successfully. Scanning file system metadata, phase 4 ... Scan completed successfully. Scanning user file metadata ... 100.00 % complete on Fri Jan 6 09:55:30 2012 Scan completed successfully. Checking Allocation Map for storage pool 'system' tsdeldisk completed. mmdeldisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root@labo_2_new]/root# mmlsdisk orafs2 disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ------- -------- ----- ------------- ------------ ------------ diskh2 nsd 512 4 yes yes ready up system diskr2 nsd 512 5 yes yes ready up system * Remove NSD that belong to the server you want to remove [root@labo_2_new]/root# mmlsnsd -m Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- diskh1 AC131C344EE5D6E8 /dev/hdisk2 labo_1 server node diskh1 AC131C344EE5D6E8 /dev/hdisk2 labo_2 server node diskh2 AC131C344EE600F2 /dev/hdisk4 labo_1 server node diskh2 AC131C344EE600F2 /dev/hdisk4 labo_2 server node diskk1 AC131C364EE5D6EC /dev/descgpfs1lv labo_s server node diskk2 AC131C364EE600F6 /dev/descgpfs2lv labo_s server node diskr1 AC131C344EE5D6EA /dev/hdisk3 labo_1 server node diskr1 AC131C344EE5D6EA /dev/hdisk3 labo_2 server node diskr2 AC131C344EE600F4 /dev/hdisk5 labo_1 server node diskr2 AC131C344EE600F4 /dev/hdisk5 labo_2 server node [root@labo_2_new]/root# mmdelnsd "diskk1;diskk2" mmdelnsd: Processing disk diskk1 mmdelnsd: Processing disk diskk2 mmdelnsd: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root@labo_2_new]/root# mmlsnsd -m Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- diskh1 AC131C344EE5D6E8 /dev/hdisk2 labo_1 server node diskh1 AC131C344EE5D6E8 /dev/hdisk2 labo_2 server node diskh2 AC131C344EE600F2 /dev/hdisk4 labo_1 server node diskh2 AC131C344EE600F2 /dev/hdisk4 labo_2 server node diskr1 AC131C344EE5D6EA /dev/hdisk3 labo_1 server node diskr1 AC131C344EE5D6EA /dev/hdisk3 labo_2 server node diskr2 AC131C344EE600F4 /dev/hdisk5 labo_1 server node diskr2 AC131C344EE600F4 /dev/hdisk5 labo_2 server node * Now The server is still member from GPFS cluster, but without resources: * Stop GPFS on the member to remove [root@labo_s_new]/root# mmshutdown Fri Jan 6 10:03:32 CET 2012: mmshutdown: Starting force unmount of GPFS file systems Fri Jan 6 10:03:37 CET 2012: mmshutdown: Shutting down GPFS daemons Shutting down! 'shutdown' command about to kill process 17816 Fri Jan 6 10:03:42 CET 2012: mmshutdown: Finished * Remove the member from GPFS [root@labo_2_new]/root# mmlscluster GPFS cluster information ======================== GPFS cluster name: gpfsOracle.labo_2 GPFS cluster id: 12399285214363632796 GPFS UID domain: gpfsOracle.labo_2 Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: labo_2 Secondary server: labo_1 Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------------------------------- 1 labo_1 10.10.10.52 labo_1 quorum 2 labo_2 10.10.10.53 labo_2 quorum 3 labo_s 10.10.10.54 labo_s quorum [root@labo_2_new]/root# mmgetstate -aLs Node number Node name Quorum Nodes up Total nodes GPFS state Remarks ------------------------------------------------------------------------------------ 1 labo_1 2 2 3 active quorum node 2 labo_2 2 2 3 active quorum node 3 labo_s 0 0 3 down quorum node Summary information --------------------- Number of nodes defined in the cluster: 3 Number of local nodes active in the cluster: 2 Number of remote nodes joined in this cluster: 0 Number of quorum nodes defined in the cluster: 3 Number of quorum nodes active in the cluster: 2 Quorum = 2, Quorum achieved [root@labo_2_new]/root# mmdelnode -N labo_s Verifying GPFS is stopped on all affected nodes ... mmdelnode: Command successfully completed mmdelnode: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root@labo_2_new]/root# mmlscluster GPFS cluster information ======================== GPFS cluster name: gpfsOracle.labo_2 GPFS cluster id: 12399285214363632796 GPFS UID domain: gpfsOracle.labo_2 Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: labo_2 Secondary server: labo_1 Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------------------------------- 1 labo_1 10.10.10.52 labo_1 quorum 2 labo_2 10.10.10.53 labo_2 quorum [root@labo_2_new]/root# mmgetstate -aLs Node number Node name Quorum Nodes up Total nodes GPFS state Remarks ------------------------------------------------------------------------------------ 1 labo_1 2 2 2 active quorum node 2 labo_2 2 2 2 active quorum node Summary information --------------------- Number of nodes defined in the cluster: 2 Number of local nodes active in the cluster: 2 Number of remote nodes joined in this cluster: 0 Number of quorum nodes defined in the cluster: 2 Number of quorum nodes active in the cluster: 2 Quorum = 2, Quorum achieved ==== Add a node to GPFS cluster: ==== * Add a new node to the GPFS cluster: add first the node as nonquorum, and then change it to quorum (else you need to stop the cluster [root@labo_2_new]/root# mmlscluster GPFS cluster information ======================== GPFS cluster name: gpfsOracle.labo_2 GPFS cluster id: 12399285214363632796 GPFS UID domain: gpfsOracle.labo_2 Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: labo_2 Secondary server: labo_1 Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------------------------------- 1 labo_1 10.10.10.52 labo_1 quorum 2 labo_2 10.10.10.53 labo_2 quorum [root@labo_2_new]/root# mmaddnode -N labo_s:nonquorum Fri Jan 6 12:37:14 CET 2012: mmaddnode: Processing node labo_s mmaddnode: Command successfully completed mmaddnode: Warning: Not all nodes have proper GPFS license designations. Use the mmchlicense command to designate licenses as needed. mmaddnode: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root@labo_2_new]/root# mmlscluster =============================================================================== | Warning: | | This cluster contains nodes that do not have a proper GPFS license | | designation. This violates the terms of the GPFS licensing agreement. | | Use the mmchlicense command and assign the appropriate GPFS licenses | | to each of the nodes in the cluster. For more information about GPFS | | license designation, see the Concepts, Planning, and Installation Guide. | =============================================================================== GPFS cluster information ======================== GPFS cluster name: gpfsOracle.labo_2 GPFS cluster id: 12399285214363632796 GPFS UID domain: gpfsOracle.labo_2 Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: labo_2 Secondary server: labo_1 Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------------------------------- 1 labo_1 10.10.10.52 labo_1 quorum 2 labo_2 10.10.10.53 labo_2 quorum 3 labo_s 10.10.10.54 labo_s [root@labo_2_new]/root# mmchlicense server --accept -N labo_s The following nodes will be designated as possessing GPFS server licenses: labo_s mmchlicense: Command successfully completed mmchlicense: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root@labo_2_new]/root# mmchnode --quorum -N labo_s Fri Jan 6 12:39:26 CET 2012: mmchnode: Processing node labo_s mmchnode: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root@labo_2_new]/root# mmlscluster GPFS cluster information ======================== GPFS cluster name: gpfsOracle.labo_2 GPFS cluster id: 12399285214363632796 GPFS UID domain: gpfsOracle.labo_2 Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: labo_2 Secondary server: labo_1 Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------------------------------- 1 labo_1 10.10.10.52 labo_1 quorum 2 labo_2 10.10.10.53 labo_2 quorum 3 labo_s 10.10.10.54 labo_s quorum [root@labo_2_new]/root# mmgetstate -aLs Node number Node name Quorum Nodes up Total nodes GPFS state Remarks ------------------------------------------------------------------------------------ 1 labo_1 2 2 3 active quorum node 2 labo_2 2 2 3 active quorum node 3 labo_s 0 0 3 down quorum node Summary information --------------------- Number of nodes defined in the cluster: 3 Number of local nodes active in the cluster: 2 Number of remote nodes joined in this cluster: 0 Number of quorum nodes defined in the cluster: 3 Number of quorum nodes active in the cluster: 2 Quorum = 2, Quorum achieved * Start GPFS on the new node: [root@labo_2_new]/root# mmgetstate -aLs Node number Node name Quorum Nodes up Total nodes GPFS state Remarks ------------------------------------------------------------------------------------ 1 labo_1 2 2 3 active quorum node 2 labo_2 2 2 3 active quorum node 3 labo_s 0 0 3 down quorum node Summary information --------------------- Number of nodes defined in the cluster: 3 Number of local nodes active in the cluster: 2 Number of remote nodes joined in this cluster: 0 Number of quorum nodes defined in the cluster: 3 Number of quorum nodes active in the cluster: 2 Quorum = 2, Quorum achieved [root@labo_s_new]/root# mmstartup Fri Jan 6 12:40:45 CET 2012: mmstartup: Starting GPFS ... [root@labo_2_new]/root# mmgetstate -aLs Node number Node name Quorum Nodes up Total nodes GPFS state Remarks ------------------------------------------------------------------------------------ 1 labo_1 2 3 3 active quorum node 2 labo_2 2 3 3 active quorum node 3 labo_s 2 3 3 active quorum node Summary information --------------------- Number of nodes defined in the cluster: 3 Number of local nodes active in the cluster: 3 Number of remote nodes joined in this cluster: 0 Number of quorum nodes defined in the cluster: 3 Number of quorum nodes active in the cluster: 3 Quorum = 2, Quorum achieved * Create NSD description files,and create the NSD [root@labo_2_new]/root# cat gpfsk_disk1 /dev/descgpfs1lv:labo_s::descOnly:3:diskk1 [root@labo_2_new]/root# mmcrnsd -F gpfsk_disk1 [root@labo_2_new]/root# cat gpfsk_disk2 /dev/descgpfs2lv:labo_s::descOnly:6:diskk2 [root@labo_2_new]/root# mmcrnsd -F gpfsk_disk2 [root@labo_2_new]/root# mmlsnsd File system Disk name NSD servers --------------------------------------------------------------------------- orafs1 diskh1 labo_1,labo_2 orafs1 diskr1 labo_1,labo_2 orafs2 diskh2 labo_1,labo_2 orafs2 diskr2 labo_1,labo_2 (free disk) diskk1 labo_s (free disk) diskk2 labo_s * Add the new disks into the filesystem, and restripe (-r) [root@labo_1_new]/kondor# mmlsdisk orafs1 disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ------- -------- ----- ------------- ------------ ------------ diskh1 nsd 512 1 yes yes ready up system diskr1 nsd 512 2 yes yes ready up system [root@labo_2_new]/root# mmadddisk orafs1 -F gpfsk_disk1 -r The following disks of orafs1 will be formatted on node labo_2: diskk1: size 163840 KB Extending Allocation Map Checking Allocation Map for storage pool 'system' Completed adding disks to file system orafs1. mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. Restriping orafs1 ... Scanning file system metadata, phase 1 ... Scan completed successfully. Scanning file system metadata, phase 2 ... Scan completed successfully. Scanning file system metadata, phase 3 ... Scan completed successfully. Scanning file system metadata, phase 4 ... Scan completed successfully. Scanning user file metadata ... 100.00 % complete on Fri Jan 6 14:19:56 2012 Scan completed successfully. Done [root@labo_1_new]/kondor# mmlsdisk orafs1 disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ------- -------- ----- ------------- ------------ ------------ diskh1 nsd 512 1 yes yes ready up system diskr1 nsd 512 2 yes yes ready up system diskk1 nsd 512 3 no no ready up system [root@labo_1_new]/kondor# mmlsdisk orafs2 disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ------- -------- ----- ------------- ------------ ------------ diskh2 nsd 512 4 yes yes ready up system diskr2 nsd 512 5 yes yes ready up system [root@labo_2_new]/root# mmadddisk orafs2 -F gpfsk_disk2 -r The following disks of orafs2 will be formatted on node labo_1: diskk2: size 163840 KB Extending Allocation Map Checking Allocation Map for storage pool 'system' Completed adding disks to file system orafs2. mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. Restriping orafs2 ... Scanning file system metadata, phase 1 ... Scan completed successfully. Scanning file system metadata, phase 2 ... Scan completed successfully. Scanning file system metadata, phase 3 ... Scan completed successfully. Scanning file system metadata, phase 4 ... Scan completed successfully. Scanning user file metadata ... 100.00 % complete on Fri Jan 6 14:21:17 2012 Scan completed successfully. Done [root@labo_1_new]/kondor# mmlsdisk orafs2 disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ------- -------- ----- ------------- ------------ ------------ diskh2 nsd 512 4 yes yes ready up system diskr2 nsd 512 5 yes yes ready up system diskk2 nsd 512 6 no no ready up system * Change the parameter to prevent the new node to umount the filesystem in case of failure: [root@labo_2_new]/root# mmlsconfig Configuration data for cluster gpfsOracle.labo_2: --------------------------------------------------- clusterName gpfsOracle.labo_2 clusterId 12399285214363632796 autoload yes minReleaseLevel 3.3.0.2 dmapiFileHandleSize 32 unmountOnDiskFail no maxMBpS 300 pagepool 256M adminMode central File systems in cluster gpfsOracle.labo_2: -------------------------------------------- /dev/orafs1 /dev/orafs2 [root@labo_2_new]/root# mmchconfig unmountOnDiskFail=yes labo_s mmchconfig: Command successfully completed mmchconfig: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root@labo_2_new]/root# mmlsconfig Configuration data for cluster gpfsOracle.labo_2: --------------------------------------------------- clusterName gpfsOracle.labo_2 clusterId 12399285214363632796 autoload yes minReleaseLevel 3.3.0.2 dmapiFileHandleSize 32 unmountOnDiskFail no [labo_s] unmountOnDiskFail yes [common] maxMBpS 300 pagepool 256M adminMode central File systems in cluster gpfsOracle.labo_2: -------------------------------------------- /dev/orafs1 /dev/orafs2 ===== Remove a node from a cluster ===== # mmchnsd "GPFS_NSD_M_B_0002:prscale-b-01" # mmchnode --noperfmon -N prscale-b-02 # mmchnode --ces-disable -N prscale-b-02 # mmperfmon config update --collectors prscale-b-02 # mmchnode --nonquorum -N prscale-b-02 # mmchnode --nomanager -N prscale-b-02 # mmdelnode -N prscale-b-02