We 'll describe the following configuration:
[root@labo_2]/mnt/gpfs/3.3# installp -agcYX -d '.' 'gpfs.base gpfs.docs' ... Name Level Part Event Result ------------------------------------------------------------------------------- gpfs.base 3.3.0.0 USR APPLY SUCCESS gpfs.base 3.3.0.0 ROOT APPLY SUCCESS gpfs.base 3.3.0.18 USR APPLY SUCCESS gpfs.base 3.3.0.18 ROOT APPLY SUCCESS
[root@labo_2]/root# genkex | grep mmfs 2e56000 1c8c1c /usr/lpp/mmfs/bin/aix32/mmfs [root@labo_2]/root# ps -ef | grep mmfs No mmfs process running
[root@labo_2]/root# cat gpfs_node labo_1:quorum labo_2:quorum labo_s:quorum [root@labo_2]/root# mmcrcluster -N gpfs_node -p labo_2 -s labo_1 -r /usr/bin/ssh -R /usr/bin/scp -C gpfsOracle -A Mon Dec 12 10:08:44 CET 2011: mmcrcluster: Processing node labo_1 Mon Dec 12 10:08:45 CET 2011: mmcrcluster: Processing node labo_2 Mon Dec 12 10:08:46 CET 2011: mmcrcluster: Processing node labo_s mmcrcluster: Command successfully completed mmcrcluster: Warning: Not all nodes have proper GPFS license designations. Use the mmchlicense command to designate licenses as needed. mmcrcluster: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. Process running only on primary and secondary [root@labo_s_new]/usr# mmlscluster GPFS cluster information ======================== GPFS cluster name: gpfsOracle.labo_2 GPFS cluster id: 12399285214363632796 GPFS UID domain: gpfsOracle.labo_2 Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: labo_2 Secondary server: labo_1 Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------------------------------- 1 labo_1 10.10.10.52 labo_1 quorum 2 labo_2 10.10.10.53 labo_2 quorum 3 labo_s 10.10.10.54 labo_s quorum
[root@labo_2]/root# mmchlicense server --accept -N labo_1,labo_2,labo_s The following nodes will be designated as possessing GPFS server licenses: labo_1 labo_2 labo_s mmchlicense: Command successfully completed mmchlicense: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root@labo_2]/root# mmlslicense Summary information --------------------- Number of nodes defined in the cluster: 3 Number of nodes with server license designation: 3 Number of nodes with client license designation: 0 Number of nodes still requiring server license designation: 0 Number of nodes still requiring client license designation: 0
On primary and secondary node (only) check process: [root@labo_2]/root# ps -ef | grep mmfs | grep -v grep root 31624 1 0 10:08:53 - 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /dev/null 128 [root@labo_2]/root# mmgetstate -aLs Node number Node name Quorum Nodes up Total nodes GPFS state Remarks ------------------------------------------------------------------------------------ 1 labo_1 0 0 3 down quorum node 2 labo_2 0 0 3 down quorum node 3 labo_s 0 0 3 down quorum node Summary information --------------------- mmgetstate: Information cannot be displayed. Either none of the nodes in the cluster are reachable, or GPFS is down on all of the nodes.
[root@labo_2]/usr# mmlsconfig Configuration data for cluster gpfsOracle.labo_2: --------------------------------------------------- clusterName gpfsOracle.labo_2 clusterId 12399285214363632796 autoload yes minReleaseLevel 3.3.0.2 dmapiFileHandleSize 32 adminMode central File systems in cluster gpfsOracle.labo_2: -------------------------------------------- (None) [root@labo_2]/# mmchconfig unmountOnDiskFail=no mmchconfig: Command successfully completed mmchconfig: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root@labo_2]/# mmchconfig maxMBpS=300 mmchconfig: Command successfully completed mmchconfig: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root@labo_2]/# mmchconfig unmountOnDiskFail=yes -N labo_s mmchconfig: Command successfully completed mmchconfig: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root@labo_2]/# mmchconfig pagepool=256M mmchconfig: Command successfully completed mmchconfig: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root@labo_2]/usr# mmlsconfig Configuration data for cluster gpfsOracle.labo_2: --------------------------------------------------- clusterName gpfsOracle.labo_2 clusterId 12399285214363632796 autoload yes minReleaseLevel 3.3.0.2 dmapiFileHandleSize 32 unmountOnDiskFail no [labo_s] unmountOnDiskFail yes [common] maxMBpS 300 pagepool 256M adminMode central File systems in cluster gpfsOracle.labo_2: -------------------------------------------- (None)
[root@labo_2]/root# mmstartup -a Mon Dec 12 10:27:21 CET 2011: mmstartup: Starting GPFS ... [root@labo_2]/root# ps -ef | grep mmfs | grep -v grep root 13672 18802 0 10:27:23 - 0:00 /usr/lpp/mmfs/bin/aix32/mmfsd root 18802 1 0 10:27:22 - 0:00 /bin/ksh /usr/lpp/mmfs/bin/runmmfs [root@labo_2]/root# mmgetstate -aLs Node number Node name Quorum Nodes up Total nodes GPFS state Remarks ------------------------------------------------------------------------------------ 1 labo_1 2 3 3 active quorum node 2 labo_2 2 3 3 active quorum node 3 labo_s 2 3 3 active quorum node Summary information --------------------- Number of nodes defined in the cluster: 3 Number of local nodes active in the cluster: 3 Number of remote nodes joined in this cluster: 0 Number of quorum nodes defined in the cluster: 3 Number of quorum nodes active in the cluster: 3 Quorum = 2, Quorum achieved [root@labo_2]/usr/lpp/mmfs/bin# mmlsmgr -c Cluster manager node: 10.10.10.52 (labo_1)
Description file: "disk_name:server_list::disk_usage:failuregroup:desired_name:storagepool"
disk_name:disk name, or logical volume name like in /dev
server_list:list of NSD server that will manage the NSD (max 8, with “,” separator)
disk_usage:dataAndMetadata, dataOnly, metadatOnly, descOnly (keep a copy of filesystem descriptor)
failuregroup:from -1 to 4000 (-1 no single point of failure), value > 4000 are automatically assigned in most case by the system, we will change it later. Very important for replication.
desired_name:label of the disk (lspv)
storagepool:default is system
[root@labo_2]/root# cat gpfs_disks1 hdisk2:labo_1,labo_2::dataAndMetadata:1:diskh1 hdisk3:labo_1,labo_2::dataAndMetadata:2:diskr1 /dev/descgpfs1lv:labo_s::descOnly:3:diskk1 [root@labo_2]/root# mmcrnsd -F gpfs_disks1 -v no mmcrnsd: Processing disk hdisk2 mmcrnsd: Processing disk hdisk3 mmcrnsd: Processing disk descgpfs1lv mmcrnsd: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root@labo_2]/root# mmlsnsd File system Disk name NSD servers --------------------------------------------------------------------------- (free disk) diskh1 labo_1,labo_2 (free disk) diskk1 labo_s (free disk) diskr1 labo_1,labo_2
Before starting, create a filesystem whith minimum replication settings, because at creation, the failuregroup will be affect to the filesystem with default setting defined by the system (if 2 disks with same characteristics: size, type; they will have the same failure group), and it don't permit to separate replication datas. Each data replicas must be on different NSD with different failuregroups, but it can only be change after filesystem creation.
To force the filesystem creation use the option “-v no”
[root@labo_2]/root# cat fs1_disks diskh1 diskr1 diskk1 [root@labo_2]/root# mmcrfs /dev/gpfslv1 -F fs1_disks -B 256K -T /oracle -v no The following disks of gpfslv1 will be formatted on node labo_1: diskh1: size 104857600 KB diskr1: size 104857600 KB diskk1: size 163840 KB Formatting file system ... Disks up to size 1.1 TB can be added to storage pool 'system'. Creating Inode File Creating Allocation Maps Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool 'system' Completed creation of file system /dev/gpfslv1. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root@labo_2]/# mmlsfs all File system attributes for /dev/gpfslv1: ======================================== flag value description ---- ---------------- ----------------------------------------------------- -f 8192 Minimum fragment size in bytes -i 512 Inode size in bytes -I 16384 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j cluster Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -a 1048576 Estimated average file size -n 32 Estimated number of nodes that will mount file system -B 262144 Block size -Q none Quotas enforced none Default quotas enabled -F 205824 Maximum number of inodes -V 11.05 (3.3.0.2) File system version -u yes Support for large LUNs? -z no Is DMAPI enabled? -L 4194304 Logfile size -E yes Exact mtime mount option -S no Suppress atime mount option -K whenpossible Strict replica allocation option -P system Disk storage pools in file system -d diskh0;diskr0;diskk0 Disks in file system -A yes Automatic mount option -o none Additional mount options -T /oracle Default mount point
By default, when the filesystem is created, you can check the failure group ID. It has to be different for each replication copie, else you 'll have data and metadata into the disk flagged as descOnly, as below on disk labeled “diskk1”:
[root@labo_2]/root# mmlsdisk /dev/gpfslv1 disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ------- -------- ----- ------------- ------------ ------------ diskh1 nsd 512 4001 yes yes ready up system diskr1 nsd 512 4001 yes yes ready up system diskk1 nsd 512 4003 yes yes ready up system [root@labo_2]/root# mmchdisk /dev/gpfslv1 change -d "diskh1:::dataAndMetadata:1::" Verifying file system configuration information ... mmchdisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root@labo_2]/root# mmchdisk /dev/gpfslv1 change -d "diskr1:::dataAndMetadata:2::" Verifying file system configuration information ... mmchdisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root@labo_2]/root# mmchdisk /dev/gpfslv1 change -d "diskk1:::descOnly:3::" Verifying file system configuration information ... mmchdisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root@labo_2]/root# mmlsdisk /dev/gpfslv1 disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ------- -------- ----- ------------- ------------ ------------ diskh1 nsd 512 1 yes yes ready up system diskr1 nsd 512 2 yes yes ready up system diskk1 nsd 512 3 no no ready up system Attention: Due to an earlier configuration change the file system is no longer properly replicated.
Now you can see that there is no more data and metadata on the descOnly disk, but we have to resynchronize the filesystem:
[root@labo_2]/root# mmrestripefs /dev/gpfslv1 -b -N mount Scanning file system metadata, phase 1 ... Scan completed successfully. Scanning file system metadata, phase 2 ... Scan completed successfully. Scanning file system metadata, phase 3 ... Scan completed successfully. Scanning file system metadata, phase 4 ... Scan completed successfully. Scanning user file metadata ... 100.00 % complete on Mon Dec 12 14:46:52 2011 Scan completed successfully. [root@labo_2]/root# mmlsdisk /dev/gpfslv1 disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ------- -------- ----- ------------- ------------ ------------ diskh1 nsd 512 1 yes yes ready up system diskr1 nsd 512 2 yes yes ready up system diskk1 nsd 512 3 no no ready up system
Change to have 2 copies of datas and metadata, and restripe the filesystem, only on NSD servers (faster), with option “-N mount”
[root@labo_2]/# mmlsfs all File system attributes for /dev/gpfslv1: ======================================== flag value description ---- ---------------- ----------------------------------------------------- -f 8192 Minimum fragment size in bytes -i 512 Inode size in bytes -I 16384 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j cluster Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -a 1048576 Estimated average file size -n 32 Estimated number of nodes that will mount file system -B 262144 Block size -Q none Quotas enforced none Default quotas enabled -F 205824 Maximum number of inodes -V 11.05 (3.3.0.2) File system version -u yes Support for large LUNs? -z no Is DMAPI enabled? -L 4194304 Logfile size -E yes Exact mtime mount option -S no Suppress atime mount option -K whenpossible Strict replica allocation option -P system Disk storage pools in file system -d diskh0;diskr0;diskk0 Disks in file system -A yes Automatic mount option -o none Additional mount options -T /oracle Default mount point [root@labo_2]/root# mmchfs /dev/gpfslv1 -m 2 -r 2 -Q yes mmchfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root@labo_2]/root# mmrestripefs /dev/gpfslv1 -b -N mount Scanning file system metadata, phase 1 ... Scan completed successfully. Scanning file system metadata, phase 2 ... Scan completed successfully. Scanning file system metadata, phase 3 ... Scan completed successfully. Scanning file system metadata, phase 4 ... Scan completed successfully. Scanning user file metadata ... 100.00 % complete on Mon Dec 12 14:46:52 2011 Scan completed successfully. [root@labo_2]/root# mmmount /dev/gpfslv1 -a Mon Dec 12 11:31:06 CET 2011: mmmount: Mounting file systems ... [root@labo_2]/# mmlsfs all File system attributes for /dev/gpfslv1: ======================================== flag value description ---- ---------------- ----------------------------------------------------- -f 8192 Minimum fragment size in bytes -i 512 Inode size in bytes -I 16384 Indirect block size in bytes -m 2 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 2 Default number of data replicas -R 2 Maximum number of data replicas -j cluster Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -a 1048576 Estimated average file size -n 32 Estimated number of nodes that will mount file system -B 262144 Block size -Q user;group;fileset Quotas enforced none Default quotas enabled -F 205824 Maximum number of inodes -V 11.05 (3.3.0.2) File system version -u yes Support for large LUNs? -z no Is DMAPI enabled? -L 4194304 Logfile size -E yes Exact mtime mount option -S no Suppress atime mount option -K whenpossible Strict replica allocation option -P system Disk storage pools in file system -d diskh0;diskr0;diskk0 Disks in file system -A yes Automatic mount option -o none Additional mount options -T /oracle Default mount point [root@labo_2]/# mmmount all -a Mon Dec 12 10:52:07 CET 2011: mmmount: Mounting file systems ... [root@labo_2]/usr# mmdf /dev/gpfslv1 disk disk size failure holds holds free KB free KB name in KB group metadata data in full blocks in fragments --------------- ------------- -------- -------- ----- -------------------- ------------------- Disks in storage pool: system (Maximum disk size allowed is 1.1 TB) diskh1 104857600 1 yes yes 104682752 (100%) 440 ( 0%) diskr1 104857600 2 yes yes 104682752 (100%) 472 ( 0%) diskk1 163840 3 no no 0 ( 0%) 0 ( 0%) ------------- -------------------- ------------------- (pool total) 209879040 209365504 (100%) 912 ( 0%) ============= ==================== =================== (total) 209879040 209365504 (100%) 912 ( 0%) Inode Information ----------------- Number of used inodes: 4025 Number of free inodes: 201799 Number of allocated inodes: 205824 Maximum number of inodes: 205824