====== GPFS installation on AIX ======

We 'll describe the following configuration:

{{:aix:gpfs6.jpg?nolink&|Final GPFS on SAN picture.}}

==== GPFS fileset installation: ====

<cli prompt='#'>
[root@labo_2]/mnt/gpfs/3.3# installp -agcYX -d '.' 'gpfs.base gpfs.docs'   
...
Name                        Level           Part        Event       Result
-------------------------------------------------------------------------------
gpfs.base                   3.3.0.0         USR         APPLY       SUCCESS    
gpfs.base                   3.3.0.0         ROOT        APPLY       SUCCESS    
gpfs.base                   3.3.0.18        USR         APPLY       SUCCESS    
gpfs.base                   3.3.0.18        ROOT        APPLY       SUCCESS    
</cli>

==== Post installation check: the kernel module is loaded ====
<cli prompt='#'>
[root@labo_2]/root# genkex | grep mmfs
         2e56000   1c8c1c /usr/lpp/mmfs/bin/aix32/mmfs
[root@labo_2]/root# ps -ef | grep mmfs
No mmfs process running
</cli>

==== Cluster definition ====
  * create a node file
<cli prompt='#'>
[root@labo_2]/root# cat gpfs_node
labo_1:quorum
labo_2:quorum
labo_s:quorum
[root@labo_2]/root# mmcrcluster -N gpfs_node -p labo_2 -s labo_1 -r /usr/bin/ssh -R /usr/bin/scp -C gpfsOracle -A
Mon Dec 12 10:08:44 CET 2011: mmcrcluster: Processing node labo_1
Mon Dec 12 10:08:45 CET 2011: mmcrcluster: Processing node labo_2
Mon Dec 12 10:08:46 CET 2011: mmcrcluster: Processing node labo_s
mmcrcluster: Command successfully completed
mmcrcluster: Warning: Not all nodes have proper GPFS license designations.
    Use the mmchlicense command to designate licenses as needed.
mmcrcluster: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.

Process running only on primary and secondary
[root@labo_s_new]/usr# mmlscluster

GPFS cluster information
========================
  GPFS cluster name:         gpfsOracle.labo_2
  GPFS cluster id:           12399285214363632796
  GPFS UID domain:           gpfsOracle.labo_2
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp

GPFS cluster configuration servers:
-----------------------------------
  Primary server:    labo_2
  Secondary server:  labo_1

 Node  Daemon node name            IP address       Admin node name             Designation    
-----------------------------------------------------------------------------------------------
   1   labo_1                    10.10.10.52     labo_1                    quorum
   2   labo_2                    10.10.10.53     labo_2                    quorum
   3   labo_s                    10.10.10.54     labo_s                    quorum
</cli>

  * register licenses:
<cli prompt='#'>
[root@labo_2]/root# mmchlicense server --accept -N labo_1,labo_2,labo_s

The following nodes will be designated as possessing GPFS server licenses:
        labo_1
        labo_2
        labo_s
mmchlicense: Command successfully completed
mmchlicense: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.
[root@labo_2]/root# mmlslicense

 Summary information 
---------------------
Number of nodes defined in the cluster:                          3
Number of nodes with server license designation:                 3
Number of nodes with client license designation:                 0
Number of nodes still requiring server license designation:      0
Number of nodes still requiring client license designation:      0
</cli>

  * check the cluster state:
<cli prompt='#'>
On primary and secondary node (only) check process:
[root@labo_2]/root# ps -ef | grep mmfs | grep -v grep
    root 31624     1   0 10:08:53      -  0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /dev/null 128

[root@labo_2]/root# mmgetstate -aLs

 Node number  Node name       Quorum  Nodes up  Total nodes  GPFS state  Remarks    
------------------------------------------------------------------------------------
       1      labo_1           0        0          3       down        quorum node
       2      labo_2           0        0          3       down        quorum node
       3      labo_s           0        0          3       down        quorum node

 Summary information 
---------------------
mmgetstate: Information cannot be displayed.  Either none of the
  nodes in the cluster are reachable, or GPFS is down on all of the nodes.
</cli>

  * Optimize cluster configuration:
<cli prompt='#'>
[root@labo_2]/usr# mmlsconfig
Configuration data for cluster gpfsOracle.labo_2:
---------------------------------------------------
clusterName gpfsOracle.labo_2
clusterId 12399285214363632796
autoload yes
minReleaseLevel 3.3.0.2
dmapiFileHandleSize 32
adminMode central

File systems in cluster gpfsOracle.labo_2:
--------------------------------------------
(None)
[root@labo_2]/# mmchconfig unmountOnDiskFail=no 
mmchconfig: Command successfully completed
mmchconfig: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.
[root@labo_2]/# mmchconfig maxMBpS=300 
mmchconfig: Command successfully completed
mmchconfig: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.
[root@labo_2]/# mmchconfig unmountOnDiskFail=yes -N labo_s
mmchconfig: Command successfully completed
mmchconfig: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.
[root@labo_2]/# mmchconfig pagepool=256M
mmchconfig: Command successfully completed
mmchconfig: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.
[root@labo_2]/usr# mmlsconfig
Configuration data for cluster gpfsOracle.labo_2:
---------------------------------------------------
clusterName gpfsOracle.labo_2
clusterId 12399285214363632796
autoload yes
minReleaseLevel 3.3.0.2
dmapiFileHandleSize 32
unmountOnDiskFail no
[labo_s]
unmountOnDiskFail yes
[common]
maxMBpS 300
pagepool 256M
adminMode central

File systems in cluster gpfsOracle.labo_2:
--------------------------------------------
(None)
</cli>

==== Starting GPFS cluster: ====
<cli prompt='#'>
[root@labo_2]/root# mmstartup -a
Mon Dec 12 10:27:21 CET 2011: mmstartup: Starting GPFS ...
[root@labo_2]/root# ps -ef | grep mmfs | grep -v grep
    root 13672 18802   0 10:27:23      -  0:00 /usr/lpp/mmfs/bin/aix32/mmfsd
    root 18802     1   0 10:27:22      -  0:00 /bin/ksh /usr/lpp/mmfs/bin/runmmfs
[root@labo_2]/root# mmgetstate -aLs

 Node number  Node name       Quorum  Nodes up  Total nodes  GPFS state  Remarks    
------------------------------------------------------------------------------------
       1      labo_1           2        3          3       active      quorum node
       2      labo_2           2        3          3       active      quorum node
       3      labo_s           2        3          3       active      quorum node

 Summary information 
---------------------
Number of nodes defined in the cluster:            3
Number of local nodes active in the cluster:       3
Number of remote nodes joined in this cluster:     0
Number of quorum nodes defined in the cluster:     3
Number of quorum nodes active in the cluster:      3
Quorum = 2, Quorum achieved

[root@labo_2]/usr/lpp/mmfs/bin# mmlsmgr -c
Cluster manager node: 10.10.10.52 (labo_1)
</cli>

==== Create NSD (Network Shared Disk) ====
  * create a nsd file:

  Description file: "disk_name:server_list::disk_usage:failuregroup:desired_name:storagepool"

//disk_name//:disk name, or logical volume name like in /dev\\
//server_list//:list of NSD server that will manage the NSD (max 8, with "," separator)\\
//disk_usage//:dataAndMetadata, dataOnly, metadatOnly, descOnly (keep a copy of filesystem descriptor)\\
//failuregroup//:from -1 to 4000 (-1 no single point of failure), value > 4000 are automatically assigned in most case by the system, we will change it later. Very important for replication.\\
//desired_name//:label of the disk (lspv)\\
//storagepool//:default is system
<cli prompt='#'>
[root@labo_2]/root# cat gpfs_disks1
hdisk2:labo_1,labo_2::dataAndMetadata:1:diskh1
hdisk3:labo_1,labo_2::dataAndMetadata:2:diskr1
/dev/descgpfs1lv:labo_s::descOnly:3:diskk1
[root@labo_2]/root# mmcrnsd -F gpfs_disks1 -v no
mmcrnsd: Processing disk hdisk2
mmcrnsd: Processing disk hdisk3
mmcrnsd: Processing disk descgpfs1lv
mmcrnsd: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.

[root@labo_2]/root# mmlsnsd

 File system   Disk name    NSD servers                                    
---------------------------------------------------------------------------
 (free disk)   diskh1       labo_1,labo_2        
 (free disk)   diskk1       labo_s                 
 (free disk)   diskr1       labo_1,labo_2        
</cli>

==== Create a GPFS filesystem ====

Before starting, create a filesystem whith minimum replication settings, because at creation, the failuregroup will be affect to the filesystem with default setting defined by the system (if 2 disks with same characteristics: size, type; they will have the same failure group), and it don't permit to separate replication datas. Each data replicas must be on different NSD with different failuregroups, but it can only be change after filesystem creation.\\
To force the filesystem creation use the option "-v no"
<cli prompt='#'>
[root@labo_2]/root# cat fs1_disks
diskh1
diskr1
diskk1
[root@labo_2]/root# mmcrfs /dev/gpfslv1 -F fs1_disks -B 256K -T /oracle -v no               

The following disks of gpfslv1 will be formatted on node labo_1:
    diskh1: size 104857600 KB
    diskr1: size 104857600 KB
    diskk1: size 163840 KB
Formatting file system ...
Disks up to size 1.1 TB can be added to storage pool 'system'.
Creating Inode File
Creating Allocation Maps
Clearing Inode Allocation Map
Clearing Block Allocation Map
Formatting Allocation Map for storage pool 'system'
Completed creation of file system /dev/gpfslv1.
mmcrfs: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.
[root@labo_2]/# mmlsfs all

File system attributes for /dev/gpfslv1:
========================================
flag value            description
---- ---------------- -----------------------------------------------------
 -f  8192             Minimum fragment size in bytes
 -i  512              Inode size in bytes
 -I  16384            Indirect block size in bytes
 -m  1                Default number of metadata replicas
 -M  2                Maximum number of metadata replicas
 -r  1                Default number of data replicas
 -R  2                Maximum number of data replicas
 -j  cluster          Block allocation type
 -D  nfs4             File locking semantics in effect
 -k  all              ACL semantics in effect
 -a  1048576          Estimated average file size
 -n  32               Estimated number of nodes that will mount file system
 -B  262144           Block size
 -Q  none             Quotas enforced
     none             Default quotas enabled
 -F  205824           Maximum number of inodes
 -V  11.05 (3.3.0.2)  File system version
 -u  yes              Support for large LUNs?
 -z  no               Is DMAPI enabled?
 -L  4194304          Logfile size
 -E  yes              Exact mtime mount option
 -S  no               Suppress atime mount option
 -K  whenpossible     Strict replica allocation option
 -P  system           Disk storage pools in file system
 -d  diskh0;diskr0;diskk0  Disks in file system
 -A  yes              Automatic mount option
 -o  none             Additional mount options
 -T  /oracle          Default mount point
</cli>

==== Changing failure groups ====

By default, when the filesystem is created, you can check the failure group ID. It has to be different for each replication copie, else you 'll have data and metadata into the disk flagged as descOnly, as below on disk labeled "diskk1":
<cli prompt='#'>
[root@labo_2]/root# mmlsdisk /dev/gpfslv1
disk         driver   sector failure holds    holds                            storage
name         type       size   group metadata data  status        availability pool
------------ -------- ------ ------- -------- ----- ------------- ------------ ------------
diskh1       nsd         512    4001 yes      yes   ready         up           system       
diskr1       nsd         512    4001 yes      yes   ready         up           system       
diskk1       nsd         512    4003 yes      yes   ready         up           system       

[root@labo_2]/root# mmchdisk /dev/gpfslv1 change -d "diskh1:::dataAndMetadata:1::"
Verifying file system configuration information ...
mmchdisk: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.
[root@labo_2]/root# mmchdisk /dev/gpfslv1 change -d "diskr1:::dataAndMetadata:2::"
Verifying file system configuration information ...
mmchdisk: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.
[root@labo_2]/root# mmchdisk /dev/gpfslv1 change -d "diskk1:::descOnly:3::"       
Verifying file system configuration information ...
mmchdisk: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.
[root@labo_2]/root# mmlsdisk /dev/gpfslv1
disk         driver   sector failure holds    holds                            storage
name         type       size   group metadata data  status        availability pool
------------ -------- ------ ------- -------- ----- ------------- ------------ ------------
diskh1       nsd         512       1 yes      yes   ready         up           system       
diskr1       nsd         512       2 yes      yes   ready         up           system       
diskk1       nsd         512       3 no       no    ready         up           system       
Attention: Due to an earlier configuration change the file system
is no longer properly replicated.
</cli>

Now you can see that there is no more data and metadata on the descOnly disk, but we have to resynchronize the filesystem:
<cli prompt='#'>
[root@labo_2]/root# mmrestripefs /dev/gpfslv1  -b -N mount
Scanning file system metadata, phase 1 ... 
Scan completed successfully.
Scanning file system metadata, phase 2 ... 
Scan completed successfully.
Scanning file system metadata, phase 3 ... 
Scan completed successfully.
Scanning file system metadata, phase 4 ... 
Scan completed successfully.
Scanning user file metadata ...
 100.00 % complete on Mon Dec 12 14:46:52 2011
Scan completed successfully.
[root@labo_2]/root# mmlsdisk /dev/gpfslv1                 
disk         driver   sector failure holds    holds                            storage
name         type       size   group metadata data  status        availability pool
------------ -------- ------ ------- -------- ----- ------------- ------------ ------------
diskh1       nsd         512       1 yes      yes   ready         up           system       
diskr1       nsd         512       2 yes      yes   ready         up           system       
diskk1       nsd         512       3 no       no    ready         up           system       
</cli>

==== Changing replication settings for the filesystem ====

Change to have 2 copies of datas and metadata, and restripe the filesystem, only on NSD servers (faster), with option  "-N mount"
<cli prompt='#'>
[root@labo_2]/# mmlsfs all

File system attributes for /dev/gpfslv1:
========================================
flag value            description
---- ---------------- -----------------------------------------------------
 -f  8192             Minimum fragment size in bytes
 -i  512              Inode size in bytes
 -I  16384            Indirect block size in bytes
 -m  1                Default number of metadata replicas
 -M  2                Maximum number of metadata replicas
 -r  1                Default number of data replicas
 -R  2                Maximum number of data replicas
 -j  cluster          Block allocation type
 -D  nfs4             File locking semantics in effect
 -k  all              ACL semantics in effect
 -a  1048576          Estimated average file size
 -n  32               Estimated number of nodes that will mount file system
 -B  262144           Block size
 -Q  none             Quotas enforced
     none             Default quotas enabled
 -F  205824           Maximum number of inodes
 -V  11.05 (3.3.0.2)  File system version
 -u  yes              Support for large LUNs?
 -z  no               Is DMAPI enabled?
 -L  4194304          Logfile size
 -E  yes              Exact mtime mount option
 -S  no               Suppress atime mount option
 -K  whenpossible     Strict replica allocation option
 -P  system           Disk storage pools in file system
 -d  diskh0;diskr0;diskk0  Disks in file system
 -A  yes              Automatic mount option
 -o  none             Additional mount options
 -T  /oracle          Default mount point

[root@labo_2]/root# mmchfs /dev/gpfslv1 -m 2 -r 2 -Q yes
mmchfs: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.

[root@labo_2]/root# mmrestripefs /dev/gpfslv1  -b -N mount
Scanning file system metadata, phase 1 ... 
Scan completed successfully.
Scanning file system metadata, phase 2 ... 
Scan completed successfully.
Scanning file system metadata, phase 3 ... 
Scan completed successfully.
Scanning file system metadata, phase 4 ... 
Scan completed successfully.
Scanning user file metadata ...
 100.00 % complete on Mon Dec 12 14:46:52 2011
Scan completed successfully.
[root@labo_2]/root# mmmount /dev/gpfslv1 -a
Mon Dec 12 11:31:06 CET 2011: mmmount: Mounting file systems ...

[root@labo_2]/# mmlsfs all                            

File system attributes for /dev/gpfslv1:
========================================
flag value            description
---- ---------------- -----------------------------------------------------
 -f  8192             Minimum fragment size in bytes
 -i  512              Inode size in bytes
 -I  16384            Indirect block size in bytes
 -m  2                Default number of metadata replicas
 -M  2                Maximum number of metadata replicas
 -r  2                Default number of data replicas
 -R  2                Maximum number of data replicas
 -j  cluster          Block allocation type
 -D  nfs4             File locking semantics in effect
 -k  all              ACL semantics in effect
 -a  1048576          Estimated average file size
 -n  32               Estimated number of nodes that will mount file system
 -B  262144           Block size
 -Q  user;group;fileset Quotas enforced
     none             Default quotas enabled
 -F  205824           Maximum number of inodes
 -V  11.05 (3.3.0.2)  File system version
 -u  yes              Support for large LUNs?
 -z  no               Is DMAPI enabled?
 -L  4194304          Logfile size
 -E  yes              Exact mtime mount option
 -S  no               Suppress atime mount option
 -K  whenpossible     Strict replica allocation option
 -P  system           Disk storage pools in file system
 -d  diskh0;diskr0;diskk0  Disks in file system
 -A  yes              Automatic mount option
 -o  none             Additional mount options
 -T  /oracle          Default mount point

[root@labo_2]/# mmmount all -a   
Mon Dec 12 10:52:07 CET 2011: mmmount: Mounting file systems ...

[root@labo_2]/usr# mmdf /dev/gpfslv1 
disk                disk size  failure holds    holds              free KB             free KB
name                    in KB    group metadata data        in full blocks        in fragments
--------------- ------------- -------- -------- ----- -------------------- -------------------
Disks in storage pool: system (Maximum disk size allowed is 1.1 TB)
diskh1              104857600        1 yes      yes       104682752 (100%)           440 ( 0%) 
diskr1              104857600        2 yes      yes       104682752 (100%)           472 ( 0%) 
diskk1                 163840        3 no       no                0 (  0%)             0 ( 0%) 
                -------------                         -------------------- -------------------
(pool total)        209879040                             209365504 (100%)           912 ( 0%)

                =============                         ==================== ===================
(total)             209879040                             209365504 (100%)           912 ( 0%)

Inode Information
-----------------
Number of used inodes:            4025
Number of free inodes:          201799
Number of allocated inodes:     205824
Maximum number of inodes:       205824
</cli>