wiki

Linux Installation and setup

On VMware, if the Scale cluster doesn't start, remove the secure boot into the VM properties

https://github.com/IBM/ibm-spectrum-scale-install-infra

GPFS Prechecks

Log4J disable

Currently a security alert on Log4J, so a workaround is to disable it

Workaround/Mitigation:

Customers are advised to edit the file /etc/sysconfig/gpfsgui on each node running the GUI to include a line like this

LOG4J_FORMAT_MSG_NO_LOOKUPS=true

[root@gpfs02 ~]# systemctl restart gpfsgui

Network / firewall

Disable SElinux in /etc/selinux/config

CES, including the CES framework as well as SMB and CES NFS, does not support 
SELinux in enforcing mode.

Check Firewall rules

[root@gpfs02 ~]#  firewall-cmd --permanent --add-port=1191/tcp
[root@gpfs02 ~]#  firewall-cmd --permanent --add-port=22/tcp
[root@gpfs02 ~]#  systemctl restart firewalld.service
[root@gpfs02 ~]# firewall-cmd --list-all
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: eno4 enp12s0f0 enp12s0f1 team0
  sources:
  services: ssh dhcpv6-client
  ports: 1191/tcp 22/tcp
  protocols:
  masquerade: no
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:

Authentification

Create ssh keys for each node, add public keys in /root/.ssh/authorized_key

Test a connexion on every interface from every host in the cluster, also each server localy.

Put all IP address from every node in /etc/hosts

Required packages

yum -y install ksh
yum -y install sg3_utils
yum -y install nfs-utils
yum -y install lshw
yum -y install net-tools
yum -y install telnet
yum -y install psmisc
yum -y install ethtool
yum -y install autogen-libopts ntp
yum -y install elfutils-libelf-devel
yum -y install python3

Time server

This step is mandatory, use the same time server for each node in the GPFS cluster. Chrony is not yet supported.

[root@gpfs01 ~]# cat /etc/ntp.conf
# Generated by Chef for gpfs01
# Local modifications will be overwritten.
tinker panic 1000 allan 1500 dispersion 15 step 0.128 stepout 900
statsdir /var/log/ntpstats/
leapfile /etc/ntp.leapseconds
driftfile /var/lib/ntp/ntp.drift

disable monitor

server timesrv1 iburst minpoll 6 maxpoll 10
restrict timesrv1 nomodify notrap noquery
server timesrv2 iburst minpoll 6 maxpoll 10
restrict timesrv2 nomodify notrap noquery

restrict default kod notrap nomodify nopeer noquery
restrict 127.0.0.1 nomodify
restrict -6 default kod notrap nomodify nopeer noquery
restrict -6 ::1 nomodify

server  127.127.1.0 # local clock
fudge   127.127.1.0 stratum 10

Setup

Install GPFS base packages

Extract installation packages (target is /usr/lpp/mmfs/<version>)

[root@gpfs01 gpfs]# ./Spectrum_Scale_Protocols_Standard-5.0.0.0-x86_64-Linux-install --text-only --silent

Now add a repository for GPFS

[root@gpfs01 yum.repos.d]# cat /etc/yum.repos.d/GPFS.repo 
[ganesha_5.0.0.0]
name=ganesha_5.0.0.0
baseurl=file:///usr/lpp/mmfs/5.0.0.0/ganesha_rpms/rhel7
gpgcheck=0
enabled=1

[gpfs_5.0.0.0]
name=gpfs_5.0.0.0
baseurl=file:///usr/lpp/mmfs/5.0.0.0/gpfs_rpms
gpgcheck=0
enabled=1

[zimon_5.0.0.0]
name=zimon_5.0.0.0
baseurl=file:///usr/lpp/mmfs/5.0.0.0/zimon_rpms/rhel7
gpgcheck=0
enabled=1

[smb_5.0.0.0]
name=smb_5.0.0.0
baseurl=file:///usr/lpp/mmfs/5.0.0.0/smb_rpms/rhel7
gpgcheck=0
enabled=1

If you are not on supported Redhat (Ex: CentOS), the use: export LINUX_DISTRIBUTION=REDHAT_AS_LINUX

[root@gpfs01 ~]# yum install gpfs.base gpfs.license.std gpfs.msg.en_US gpfs.docs gpfs.gskit

Required packages:

  gpfs.base-5.0.*.rpm
  gpfs.gpl-5.0.*.noarch.rpm
  gpfs.msg.en_US-5.0.*.noarch.rpm
  gpfs.gskit-8.0.50.*.rpm
  gpfs.license*.rpm
  gpfs.ext-5.0.*.rpm
  gpfs.compression-5.0.*.rpm

Also required for compiling gpfs kernel module

[root@gpfs01 ~]# yum -y install kernel-devel cpp gcc gcc-c++ kernel-headers

Compile gpfs kernel module

[root@gpfs01 ~]# export LINUX_DISTRIBUTION=REDHAT_AS_LINUX 
[root@gpfs01 ~]# mmbuildgpl

Root profile

[root@gpfs01 ~]# cat .bashrc
...

######
# Specifics for GPFS testing
######
export PATH=$PATH:$HOME/bin:/usr/lpp/mmfs/bin
export WCOLL=/root/.ssh/wcoll_nodes

[root@gpfs01 ~]# cat /root/.ssh/wcoll_nodes
gpfs01:quorum-manager
gpfs02

Always keep all nodes from cluster synchronized, with same files

Initalize the cluster

Create the GPFS cluster

[root@gpfs01 ~]# cat /root/gpfs_nodes.txt
gpfs01:quorum-manager
gpfs02:quorum-manager

Basic cluster without CCR

[root@gpfs01 ~]# mmcrcluster -N gpfs_nodes.txt --ccr-disable -p gpfs01 -r /usr/bin/ssh -R /usr/bin/scp -C gpfs01.cluster -U GPFS

With CCR

[root@gpfs01 ~]# cat gpfs_nodes.txt
gpfs01:quorum-manager
gpfs02:quorum-manager
[root@gpfs01 ~]# mmcrcluster -N gpfs_nodes.txt --ccr-enable -r /usr/bin/ssh -R /usr/bin/scp -C gpfs01.cluster -U GPFS
[root@gpfs01 ~]# mmchlicense server --accept -N gpfs01,gpfs02

List cluster config

[root@gpfs01 firewalld]# mmlscluster

GPFS cluster information
========================
  GPFS cluster name:         gpfs01.cluster
  GPFS cluster id:           xxxxxxxxxxx
  GPFS UID domain:           GPFS
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp
  Repository type:           CCR

 Node  Daemon node name  IP address  Admin node name  Designation
------------------------------------------------------------------
   1   gpfs01         10.x.x.x    gpfs01        quorum-manager
   2   gpfs02         10.x.x.x    gpfs02        quorum-manager

Create NSD devices

NSD (network shared devices), are disk used on direct attach to servers.

usage can be dataAndMetadata, dataOnly, metadataOnly, descOnly
each NSD is associated to a pool, disks with same characteristics, default is SYSTEM (if metadataOnly can only belong to SYSTEM
failure group allows to add mutiple copies of datas (up to 3)

[root@gpfs01 ~]# cat gpfs_disks.txt
%nsd:
device=/dev/dm-11
nsd=GPFS_NSD_DATA01
servers=gpfs01,gpfs02
usage=dataAndMetadata
failureGroup=1
pool=system

%nsd:
device=/dev/dm-10
nsd=GPFS_NSD_DATA02
servers=gpfs01,gpfs02
usage=dataOnly
failureGroup=1
pool=slow

%nsd:
device=/dev/dm-12
nsd=GPFS_NSD_DATA03
servers=gpfs01,gpfs02
usage=dataAndMetadata
failureGroup=2
pool=system

Create the NSD definition, you can force using (-v no) if you want to erase previously used disks

[root@gpfs01 ~]# mmcrnsd -F gpfs_disks.txt

List physical mapping of NSD (NSD with disks association)

[root@gpfs01 ~]# mmlsnsd -X
 Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
 GPFS_NSD_DATA01 0A0113xxxxxF0FC1   /dev/dm-11     dmm      gpfs01                server node
 GPFS_NSD_DATA01 0A0113xxxxxF0FC1   /dev/dm-8      dmm      gpfs02
 GPFS_NSD_DATA02 0A0113xxxxxF0FC2   /dev/dm-10     dmm      gpfs01                server node
 GPFS_NSD_DATA02 0A0113xxxxxF0FC2   /dev/dm-5      dmm      gpfs02
 GPFS_NSD_DATA03 0A0113xxxxxF0FC3   /dev/dm-12     dmm      gpfs01                server node
 GPFS_NSD_DATA03 0A0113xxxxxF0FC3   /dev/dm-3      dmm      gpfs02

List filesystem association with NSD (here free)

[root@gpfs01 ~]# mmlsnsd
 File system   Disk name    NSD servers                                    
---------------------------------------------------------------------------
 (free disk)   GPFS_NSD_DATA01  gpfs01,gpfs02 
 (free disk)   GPFS_NSD_DATA02  gpfs01,gpfs02 
 (free disk)   GPFS_NSD_DATA03  gpfs01,gpfs02

Create Filesystem

First start GPFS service if not started (-a all nodes)

[root@gpfs01 ~]# mmstartup -a

[root@gpfs01 ~]# mmgetstate -aLs
 Node number  Node name       Quorum  Nodes up  Total nodes  GPFS state   Remarks
-------------------------------------------------------------------------------------
       1      gpfs01             2        2          2       active       quorum node
       2      gpfs02             2        2          2       active       quorum node

Each filesystem is created on one or more NSD.

gpfs_disks01.txt has the same format as gpfs_disks.txt, but put only NSD for the filesystem gpfs01

[root@gpfs01 scripts]# mmcrfs gpfs01lv -F gpfs_disks01.txt -T /gpfs01 -k nfs4 -D nfs4 -m1 -M2 -r1 -R2

The following disks of gpfs01lv will be formatted on node gpfs01:
    GPFS_NSD_DATA01: size 102400 MB
    GPFS_NSD_DATA02: size 102400 MB
    GPFS_NSD_DATA03: size 102400 MB
Formatting file system ...
Disks up to size 1.5 TB can be added to storage pool system.
Creating Inode File
Creating Allocation Maps
Creating Log Files
Clearing Inode Allocation Map
Clearing Block Allocation Map
Formatting Allocation Map for storage pool system
Completed creation of file system /dev/gpfs01lv.
mmcrfs: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.

[root@gpfs01 scripts]# mmlsfs gpfs01lv
flag                value                    description
------------------- ------------------------ -----------------------------------
 -f                 8192                     Minimum fragment (subblock) size in bytes
 -i                 4096                     Inode size in bytes
 -I                 32768                    Indirect block size in bytes
 -m                 1                        Default number of metadata replicas
 -M                 2                        Maximum number of metadata replicas
 -r                 1                        Default number of data replicas
 -R                 2                        Maximum number of data replicas
 -j                 scatter                  Block allocation type
 -D                 nfs4                     File locking semantics in effect
 -k                 nfs4                     ACL semantics in effect
 -n                 32                       Estimated number of nodes that will mount file system
 -B                 4194304                  Block size
 -Q                 none                     Quotas accounting enabled
                    none                     Quotas enforced
                    none                     Default quotas enabled
 --perfileset-quota No                       Per-fileset quota enforcement
 --filesetdf        No                       Fileset df enabled?
 -V                 18.00 (5.0.0.0)          File system version
 --create-time      Tue Apr 24 15:19:36 2018 File system creation time
 -z                 No                       Is DMAPI enabled?
 -L                 33554432                 Logfile size
 -E                 Yes                      Exact mtime mount option
 -S                 relatime                 Suppress atime mount option
 -K                 whenpossible             Strict replica allocation option
 --fastea           Yes                      Fast external attributes enabled?
 --encryption       No                       Encryption enabled?
 --inode-limit      656384                   Maximum number of inodes
 --log-replicas     0                        Number of log replicas
 --is4KAligned      Yes                      is4KAligned?
 --rapid-repair     Yes                      rapidRepair enabled?
 --write-cache-threshold 0                   HAWC Threshold (max 65536)
 --subblocks-per-full-block 512              Number of subblocks per full block
 -P                 system                   Disk storage pools in file system
 --file-audit-log   No                       File Audit Logging enabled?
 -d                 GPFS_NSD_DATA01;GPFS_NSD_DATA02;GPFS_NSD_DATA03  Disks in file system
 -A                 yes                      Automatic mount option
 -o                 none                     Additional mount options
 -T                 /gpfs01                  Default mount point
 --mount-priority   0                        Mount priority
[root@gpfs01 scripts]# df
Filesystem                       1K-blocks    Used Available Use% Mounted on
/dev/mapper/rhel_gpfs01-root      20961280 5113632  15847648  25% /
devtmpfs                           8066056       0   8066056   0% /dev
tmpfs                              8082324       0   8082324   0% /dev/shm
tmpfs                              8082324    9696   8072628   1% /run
tmpfs                              8082324       0   8082324   0% /sys/fs/cgroup
/dev/sda2                          2086912  175484   1911428   9% /boot
/dev/sda1                          2093048    9964   2083084   1% /boot/efi
/dev/mapper/rhel_gpfs01-software  31441920 5624628  25817292  18% /software
/dev/mapper/rhel_gpfs01-var       20961280  904832  20056448   5% /var
/dev/mapper/rhel_gpfs01-home      20961280   37836  20923444   1% /home
tmpfs                              1616468      12   1616456   1% /run/user/42
tmpfs                              1616468       0   1616468   0% /run/user/0

Mount the filesystem on all nodes

[root@gpfs01 scripts]# mmmount gpfs01lv -a

[root@gpfs01 scripts]# mmdf gpfs01lv
disk                disk size  failure holds    holds              free KB             free KB
name                    in KB    group metadata data        in full blocks        in fragments
--------------- ------------- -------- -------- ----- -------------------- -------------------
Disks in storage pool: system (Maximum disk size allowed is 1.5 TB)
GPFS_NSD_DATA01     104857600       -1 Yes      Yes       104460288 (100%)         11896 ( 0%)
GPFS_NSD_DATA02     104857600       -1 Yes      Yes       104464384 (100%)         11896 ( 0%)
GPFS_NSD_DATA03     104857600       -1 Yes      Yes       104460288 (100%)         11896 ( 0%)
...
                -------------                         -------------------- -------------------
(pool total)        671088640                             667193344 ( 99%)        119056 ( 0%)

                =============                         ==================== ===================
(total)             671088640                             667193344 ( 99%)        119056 ( 0%)

Inode Information
-----------------
Number of used inodes:            4038
Number of free inodes:          497722
Number of allocated inodes:     501760
Maximum number of inodes:       656384

[root@gpfs01 ~]# mmlsfs all -d

File system attributes for /dev/gpfs01:
======================================
flag                value                    description
------------------- ------------------------ -----------------------------------
 -d                 GPFS_NSD_DATA01;GPFS_NSD_DATA02;GPFS_NSD_DATA03  Disks in file system

[root@gpfs01 ~]# mmmount all
Fri Sep 30 16:16:04 CEST 2016: mmmount: Mounting file systems ...

Other examples

[root@gpfs01 ~]# mmlsdisk /dev/gpfs1
disk         driver   sector     failure holds    holds                            storage
name         type       size       group metadata data  status        availability pool
------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------
nsd_t1_h     nsd         512           1 Yes      Yes   ready         up           system
nsd_t2_h     nsd         512           1 No       Yes   ready         up           slow
nsd_t1_r     nsd         512           2 Yes      Yes   ready         up           system
nsd_t2_r     nsd         512           2 No       Yes   ready         up           slow
[root@rhlabh1 ~]# mmlsdisk /dev/gpfs1 -M

Disk name     IO performed on node     Device             Availability
------------  -----------------------  -----------------  ------------
nsd_t1_h      localhost                /dev/dm-11         up
nsd_t2_h      localhost                /dev/dm-10         up
nsd_t1_r      localhost                /dev/dm-12         up
nsd_t2_r      localhost                /dev/dm-13         up

[root@rhlabh1 ~]# df
Filesystem                  1K-blocks    Used Available Use% Mounted on
...
gpfs1                       125829120 3387392 122441728   3% /gpfs1
[root@rhlabh1 ~]# mmdf /dev/gpfs1
disk                disk size  failure holds    holds              free KB             free KB
name                    in KB    group metadata data        in full blocks        in fragments
--------------- ------------- -------- -------- ----- -------------------- -------------------
Disks in storage pool: system (Maximum disk size allowed is 261 GB)
nsd_t1_h             31457280        1 Yes      Yes        29829632 ( 95%)           584 ( 0%)
nsd_t1_r             31457280        2 Yes      Yes        29829632 ( 95%)           584 ( 0%)
                -------------                         -------------------- -------------------
(pool total)         62914560                              59659264 ( 95%)          1168 ( 0%)

Disks in storage pool: slow (Maximum disk size allowed is 261 GB)
nsd_t2_h             31457280        1 No       Yes        31391232 (100%)           376 ( 0%)
nsd_t2_r             31457280        2 No       Yes        31391232 (100%)           376 ( 0%)
                -------------                         -------------------- -------------------
(pool total)         62914560                              62782464 (100%)           752 ( 0%)

                =============                         ==================== ===================
(data)              125829120                             122441728 ( 97%)          1920 ( 0%)
(metadata)           62914560                              59659264 ( 95%)          1168 ( 0%)
                =============                         ==================== ===================
(total)             125829120                             122441728 ( 97%)          1920 ( 0%)

Inode Information
-----------------
Number of used inodes:            4038
Number of free inodes:          118970
Number of allocated inodes:     123008
Maximum number of inodes:       123008

If only 2 nodes, you 'll need tiebreaker or quorum disks (odd number). This operation must be done offline

[root@gpfs01 scripts]# mmchconfig tiebreakerDisks="GPFS_NSD_DATA01;GPFS_NSD_DATA02;GPFS_NSD_DATA03"
mmchconfig: Command successfully completed
mmchconfig: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.