====== GPFS installation on AIX ======
We 'll describe the following configuration:
{{:aix:gpfs6.jpg?nolink&|Final GPFS on SAN picture.}}
==== GPFS fileset installation: ====
[root@labo_2]/mnt/gpfs/3.3# installp -agcYX -d '.' 'gpfs.base gpfs.docs'
...
Name Level Part Event Result
-------------------------------------------------------------------------------
gpfs.base 3.3.0.0 USR APPLY SUCCESS
gpfs.base 3.3.0.0 ROOT APPLY SUCCESS
gpfs.base 3.3.0.18 USR APPLY SUCCESS
gpfs.base 3.3.0.18 ROOT APPLY SUCCESS
==== Post installation check: the kernel module is loaded ====
[root@labo_2]/root# genkex | grep mmfs
2e56000 1c8c1c /usr/lpp/mmfs/bin/aix32/mmfs
[root@labo_2]/root# ps -ef | grep mmfs
No mmfs process running
==== Cluster definition ====
* create a node file
[root@labo_2]/root# cat gpfs_node
labo_1:quorum
labo_2:quorum
labo_s:quorum
[root@labo_2]/root# mmcrcluster -N gpfs_node -p labo_2 -s labo_1 -r /usr/bin/ssh -R /usr/bin/scp -C gpfsOracle -A
Mon Dec 12 10:08:44 CET 2011: mmcrcluster: Processing node labo_1
Mon Dec 12 10:08:45 CET 2011: mmcrcluster: Processing node labo_2
Mon Dec 12 10:08:46 CET 2011: mmcrcluster: Processing node labo_s
mmcrcluster: Command successfully completed
mmcrcluster: Warning: Not all nodes have proper GPFS license designations.
Use the mmchlicense command to designate licenses as needed.
mmcrcluster: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
Process running only on primary and secondary
[root@labo_s_new]/usr# mmlscluster
GPFS cluster information
========================
GPFS cluster name: gpfsOracle.labo_2
GPFS cluster id: 12399285214363632796
GPFS UID domain: gpfsOracle.labo_2
Remote shell command: /usr/bin/ssh
Remote file copy command: /usr/bin/scp
GPFS cluster configuration servers:
-----------------------------------
Primary server: labo_2
Secondary server: labo_1
Node Daemon node name IP address Admin node name Designation
-----------------------------------------------------------------------------------------------
1 labo_1 10.10.10.52 labo_1 quorum
2 labo_2 10.10.10.53 labo_2 quorum
3 labo_s 10.10.10.54 labo_s quorum
* register licenses:
[root@labo_2]/root# mmchlicense server --accept -N labo_1,labo_2,labo_s
The following nodes will be designated as possessing GPFS server licenses:
labo_1
labo_2
labo_s
mmchlicense: Command successfully completed
mmchlicense: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
[root@labo_2]/root# mmlslicense
Summary information
---------------------
Number of nodes defined in the cluster: 3
Number of nodes with server license designation: 3
Number of nodes with client license designation: 0
Number of nodes still requiring server license designation: 0
Number of nodes still requiring client license designation: 0
* check the cluster state:
On primary and secondary node (only) check process:
[root@labo_2]/root# ps -ef | grep mmfs | grep -v grep
root 31624 1 0 10:08:53 - 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /dev/null 128
[root@labo_2]/root# mmgetstate -aLs
Node number Node name Quorum Nodes up Total nodes GPFS state Remarks
------------------------------------------------------------------------------------
1 labo_1 0 0 3 down quorum node
2 labo_2 0 0 3 down quorum node
3 labo_s 0 0 3 down quorum node
Summary information
---------------------
mmgetstate: Information cannot be displayed. Either none of the
nodes in the cluster are reachable, or GPFS is down on all of the nodes.
* Optimize cluster configuration:
[root@labo_2]/usr# mmlsconfig
Configuration data for cluster gpfsOracle.labo_2:
---------------------------------------------------
clusterName gpfsOracle.labo_2
clusterId 12399285214363632796
autoload yes
minReleaseLevel 3.3.0.2
dmapiFileHandleSize 32
adminMode central
File systems in cluster gpfsOracle.labo_2:
--------------------------------------------
(None)
[root@labo_2]/# mmchconfig unmountOnDiskFail=no
mmchconfig: Command successfully completed
mmchconfig: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
[root@labo_2]/# mmchconfig maxMBpS=300
mmchconfig: Command successfully completed
mmchconfig: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
[root@labo_2]/# mmchconfig unmountOnDiskFail=yes -N labo_s
mmchconfig: Command successfully completed
mmchconfig: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
[root@labo_2]/# mmchconfig pagepool=256M
mmchconfig: Command successfully completed
mmchconfig: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
[root@labo_2]/usr# mmlsconfig
Configuration data for cluster gpfsOracle.labo_2:
---------------------------------------------------
clusterName gpfsOracle.labo_2
clusterId 12399285214363632796
autoload yes
minReleaseLevel 3.3.0.2
dmapiFileHandleSize 32
unmountOnDiskFail no
[labo_s]
unmountOnDiskFail yes
[common]
maxMBpS 300
pagepool 256M
adminMode central
File systems in cluster gpfsOracle.labo_2:
--------------------------------------------
(None)
==== Starting GPFS cluster: ====
[root@labo_2]/root# mmstartup -a
Mon Dec 12 10:27:21 CET 2011: mmstartup: Starting GPFS ...
[root@labo_2]/root# ps -ef | grep mmfs | grep -v grep
root 13672 18802 0 10:27:23 - 0:00 /usr/lpp/mmfs/bin/aix32/mmfsd
root 18802 1 0 10:27:22 - 0:00 /bin/ksh /usr/lpp/mmfs/bin/runmmfs
[root@labo_2]/root# mmgetstate -aLs
Node number Node name Quorum Nodes up Total nodes GPFS state Remarks
------------------------------------------------------------------------------------
1 labo_1 2 3 3 active quorum node
2 labo_2 2 3 3 active quorum node
3 labo_s 2 3 3 active quorum node
Summary information
---------------------
Number of nodes defined in the cluster: 3
Number of local nodes active in the cluster: 3
Number of remote nodes joined in this cluster: 0
Number of quorum nodes defined in the cluster: 3
Number of quorum nodes active in the cluster: 3
Quorum = 2, Quorum achieved
[root@labo_2]/usr/lpp/mmfs/bin# mmlsmgr -c
Cluster manager node: 10.10.10.52 (labo_1)
==== Create NSD (Network Shared Disk) ====
* create a nsd file:
Description file: "disk_name:server_list::disk_usage:failuregroup:desired_name:storagepool"
//disk_name//:disk name, or logical volume name like in /dev\\
//server_list//:list of NSD server that will manage the NSD (max 8, with "," separator)\\
//disk_usage//:dataAndMetadata, dataOnly, metadatOnly, descOnly (keep a copy of filesystem descriptor)\\
//failuregroup//:from -1 to 4000 (-1 no single point of failure), value > 4000 are automatically assigned in most case by the system, we will change it later. Very important for replication.\\
//desired_name//:label of the disk (lspv)\\
//storagepool//:default is system
[root@labo_2]/root# cat gpfs_disks1
hdisk2:labo_1,labo_2::dataAndMetadata:1:diskh1
hdisk3:labo_1,labo_2::dataAndMetadata:2:diskr1
/dev/descgpfs1lv:labo_s::descOnly:3:diskk1
[root@labo_2]/root# mmcrnsd -F gpfs_disks1 -v no
mmcrnsd: Processing disk hdisk2
mmcrnsd: Processing disk hdisk3
mmcrnsd: Processing disk descgpfs1lv
mmcrnsd: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
[root@labo_2]/root# mmlsnsd
File system Disk name NSD servers
---------------------------------------------------------------------------
(free disk) diskh1 labo_1,labo_2
(free disk) diskk1 labo_s
(free disk) diskr1 labo_1,labo_2
==== Create a GPFS filesystem ====
Before starting, create a filesystem whith minimum replication settings, because at creation, the failuregroup will be affect to the filesystem with default setting defined by the system (if 2 disks with same characteristics: size, type; they will have the same failure group), and it don't permit to separate replication datas. Each data replicas must be on different NSD with different failuregroups, but it can only be change after filesystem creation.\\
To force the filesystem creation use the option "-v no"
[root@labo_2]/root# cat fs1_disks
diskh1
diskr1
diskk1
[root@labo_2]/root# mmcrfs /dev/gpfslv1 -F fs1_disks -B 256K -T /oracle -v no
The following disks of gpfslv1 will be formatted on node labo_1:
diskh1: size 104857600 KB
diskr1: size 104857600 KB
diskk1: size 163840 KB
Formatting file system ...
Disks up to size 1.1 TB can be added to storage pool 'system'.
Creating Inode File
Creating Allocation Maps
Clearing Inode Allocation Map
Clearing Block Allocation Map
Formatting Allocation Map for storage pool 'system'
Completed creation of file system /dev/gpfslv1.
mmcrfs: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
[root@labo_2]/# mmlsfs all
File system attributes for /dev/gpfslv1:
========================================
flag value description
---- ---------------- -----------------------------------------------------
-f 8192 Minimum fragment size in bytes
-i 512 Inode size in bytes
-I 16384 Indirect block size in bytes
-m 1 Default number of metadata replicas
-M 2 Maximum number of metadata replicas
-r 1 Default number of data replicas
-R 2 Maximum number of data replicas
-j cluster Block allocation type
-D nfs4 File locking semantics in effect
-k all ACL semantics in effect
-a 1048576 Estimated average file size
-n 32 Estimated number of nodes that will mount file system
-B 262144 Block size
-Q none Quotas enforced
none Default quotas enabled
-F 205824 Maximum number of inodes
-V 11.05 (3.3.0.2) File system version
-u yes Support for large LUNs?
-z no Is DMAPI enabled?
-L 4194304 Logfile size
-E yes Exact mtime mount option
-S no Suppress atime mount option
-K whenpossible Strict replica allocation option
-P system Disk storage pools in file system
-d diskh0;diskr0;diskk0 Disks in file system
-A yes Automatic mount option
-o none Additional mount options
-T /oracle Default mount point
==== Changing failure groups ====
By default, when the filesystem is created, you can check the failure group ID. It has to be different for each replication copie, else you 'll have data and metadata into the disk flagged as descOnly, as below on disk labeled "diskk1":
[root@labo_2]/root# mmlsdisk /dev/gpfslv1
disk driver sector failure holds holds storage
name type size group metadata data status availability pool
------------ -------- ------ ------- -------- ----- ------------- ------------ ------------
diskh1 nsd 512 4001 yes yes ready up system
diskr1 nsd 512 4001 yes yes ready up system
diskk1 nsd 512 4003 yes yes ready up system
[root@labo_2]/root# mmchdisk /dev/gpfslv1 change -d "diskh1:::dataAndMetadata:1::"
Verifying file system configuration information ...
mmchdisk: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
[root@labo_2]/root# mmchdisk /dev/gpfslv1 change -d "diskr1:::dataAndMetadata:2::"
Verifying file system configuration information ...
mmchdisk: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
[root@labo_2]/root# mmchdisk /dev/gpfslv1 change -d "diskk1:::descOnly:3::"
Verifying file system configuration information ...
mmchdisk: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
[root@labo_2]/root# mmlsdisk /dev/gpfslv1
disk driver sector failure holds holds storage
name type size group metadata data status availability pool
------------ -------- ------ ------- -------- ----- ------------- ------------ ------------
diskh1 nsd 512 1 yes yes ready up system
diskr1 nsd 512 2 yes yes ready up system
diskk1 nsd 512 3 no no ready up system
Attention: Due to an earlier configuration change the file system
is no longer properly replicated.
Now you can see that there is no more data and metadata on the descOnly disk, but we have to resynchronize the filesystem:
[root@labo_2]/root# mmrestripefs /dev/gpfslv1 -b -N mount
Scanning file system metadata, phase 1 ...
Scan completed successfully.
Scanning file system metadata, phase 2 ...
Scan completed successfully.
Scanning file system metadata, phase 3 ...
Scan completed successfully.
Scanning file system metadata, phase 4 ...
Scan completed successfully.
Scanning user file metadata ...
100.00 % complete on Mon Dec 12 14:46:52 2011
Scan completed successfully.
[root@labo_2]/root# mmlsdisk /dev/gpfslv1
disk driver sector failure holds holds storage
name type size group metadata data status availability pool
------------ -------- ------ ------- -------- ----- ------------- ------------ ------------
diskh1 nsd 512 1 yes yes ready up system
diskr1 nsd 512 2 yes yes ready up system
diskk1 nsd 512 3 no no ready up system
==== Changing replication settings for the filesystem ====
Change to have 2 copies of datas and metadata, and restripe the filesystem, only on NSD servers (faster), with option "-N mount"
[root@labo_2]/# mmlsfs all
File system attributes for /dev/gpfslv1:
========================================
flag value description
---- ---------------- -----------------------------------------------------
-f 8192 Minimum fragment size in bytes
-i 512 Inode size in bytes
-I 16384 Indirect block size in bytes
-m 1 Default number of metadata replicas
-M 2 Maximum number of metadata replicas
-r 1 Default number of data replicas
-R 2 Maximum number of data replicas
-j cluster Block allocation type
-D nfs4 File locking semantics in effect
-k all ACL semantics in effect
-a 1048576 Estimated average file size
-n 32 Estimated number of nodes that will mount file system
-B 262144 Block size
-Q none Quotas enforced
none Default quotas enabled
-F 205824 Maximum number of inodes
-V 11.05 (3.3.0.2) File system version
-u yes Support for large LUNs?
-z no Is DMAPI enabled?
-L 4194304 Logfile size
-E yes Exact mtime mount option
-S no Suppress atime mount option
-K whenpossible Strict replica allocation option
-P system Disk storage pools in file system
-d diskh0;diskr0;diskk0 Disks in file system
-A yes Automatic mount option
-o none Additional mount options
-T /oracle Default mount point
[root@labo_2]/root# mmchfs /dev/gpfslv1 -m 2 -r 2 -Q yes
mmchfs: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
[root@labo_2]/root# mmrestripefs /dev/gpfslv1 -b -N mount
Scanning file system metadata, phase 1 ...
Scan completed successfully.
Scanning file system metadata, phase 2 ...
Scan completed successfully.
Scanning file system metadata, phase 3 ...
Scan completed successfully.
Scanning file system metadata, phase 4 ...
Scan completed successfully.
Scanning user file metadata ...
100.00 % complete on Mon Dec 12 14:46:52 2011
Scan completed successfully.
[root@labo_2]/root# mmmount /dev/gpfslv1 -a
Mon Dec 12 11:31:06 CET 2011: mmmount: Mounting file systems ...
[root@labo_2]/# mmlsfs all
File system attributes for /dev/gpfslv1:
========================================
flag value description
---- ---------------- -----------------------------------------------------
-f 8192 Minimum fragment size in bytes
-i 512 Inode size in bytes
-I 16384 Indirect block size in bytes
-m 2 Default number of metadata replicas
-M 2 Maximum number of metadata replicas
-r 2 Default number of data replicas
-R 2 Maximum number of data replicas
-j cluster Block allocation type
-D nfs4 File locking semantics in effect
-k all ACL semantics in effect
-a 1048576 Estimated average file size
-n 32 Estimated number of nodes that will mount file system
-B 262144 Block size
-Q user;group;fileset Quotas enforced
none Default quotas enabled
-F 205824 Maximum number of inodes
-V 11.05 (3.3.0.2) File system version
-u yes Support for large LUNs?
-z no Is DMAPI enabled?
-L 4194304 Logfile size
-E yes Exact mtime mount option
-S no Suppress atime mount option
-K whenpossible Strict replica allocation option
-P system Disk storage pools in file system
-d diskh0;diskr0;diskk0 Disks in file system
-A yes Automatic mount option
-o none Additional mount options
-T /oracle Default mount point
[root@labo_2]/# mmmount all -a
Mon Dec 12 10:52:07 CET 2011: mmmount: Mounting file systems ...
[root@labo_2]/usr# mmdf /dev/gpfslv1
disk disk size failure holds holds free KB free KB
name in KB group metadata data in full blocks in fragments
--------------- ------------- -------- -------- ----- -------------------- -------------------
Disks in storage pool: system (Maximum disk size allowed is 1.1 TB)
diskh1 104857600 1 yes yes 104682752 (100%) 440 ( 0%)
diskr1 104857600 2 yes yes 104682752 (100%) 472 ( 0%)
diskk1 163840 3 no no 0 ( 0%) 0 ( 0%)
------------- -------------------- -------------------
(pool total) 209879040 209365504 (100%) 912 ( 0%)
============= ==================== ===================
(total) 209879040 209365504 (100%) 912 ( 0%)
Inode Information
-----------------
Number of used inodes: 4025
Number of free inodes: 201799
Number of allocated inodes: 205824
Maximum number of inodes: 205824