===== Linux Installation and setup =====
:!: On VMware, if the Scale cluster doesn't start, remove the **secure boot** into the VM properties
https://github.com/IBM/ibm-spectrum-scale-install-infra
===== GPFS Prechecks =====
==== Log4J disable ====
Currently a security alert on Log4J, so a workaround is to disable it
Workaround/Mitigation:
Customers are advised to edit the file /etc/sysconfig/gpfsgui on each node running the GUI to include a line like this
LOG4J_FORMAT_MSG_NO_LOOKUPS=true
[root@gpfs02 ~]# systemctl restart gpfsgui
==== Network / firewall ====
Disable SElinux in /etc/selinux/config
CES, including the CES framework as well as SMB and CES NFS, does not support
SELinux in enforcing mode.
Check Firewall rules
[root@gpfs02 ~]# firewall-cmd --permanent --add-port=1191/tcp
[root@gpfs02 ~]# firewall-cmd --permanent --add-port=22/tcp
[root@gpfs02 ~]# systemctl restart firewalld.service
[root@gpfs02 ~]# firewall-cmd --list-all
public (active)
target: default
icmp-block-inversion: no
interfaces: eno4 enp12s0f0 enp12s0f1 team0
sources:
services: ssh dhcpv6-client
ports: 1191/tcp 22/tcp
protocols:
masquerade: no
forward-ports:
source-ports:
icmp-blocks:
rich rules:
==== Authentification ====
Create ssh keys for each node, add public keys in /root/.ssh/authorized_key
Test a connexion on every interface from every host in the cluster, also each server localy.
Put all IP address from every node in /etc/hosts
==== Required packages ====
yum -y install ksh
yum -y install sg3_utils
yum -y install nfs-utils
yum -y install lshw
yum -y install net-tools
yum -y install telnet
yum -y install psmisc
yum -y install ethtool
yum -y install autogen-libopts ntp
yum -y install elfutils-libelf-devel
yum -y install python3
==== Time server ====
This step is mandatory, use the same time server for each node in the GPFS cluster. Chrony is not yet supported.
[root@gpfs01 ~]# cat /etc/ntp.conf
# Generated by Chef for gpfs01
# Local modifications will be overwritten.
tinker panic 1000 allan 1500 dispersion 15 step 0.128 stepout 900
statsdir /var/log/ntpstats/
leapfile /etc/ntp.leapseconds
driftfile /var/lib/ntp/ntp.drift
disable monitor
server timesrv1 iburst minpoll 6 maxpoll 10
restrict timesrv1 nomodify notrap noquery
server timesrv2 iburst minpoll 6 maxpoll 10
restrict timesrv2 nomodify notrap noquery
restrict default kod notrap nomodify nopeer noquery
restrict 127.0.0.1 nomodify
restrict -6 default kod notrap nomodify nopeer noquery
restrict -6 ::1 nomodify
server 127.127.1.0 # local clock
fudge 127.127.1.0 stratum 10
===== Setup =====
==== Install GPFS base packages ====
Extract installation packages (target is /usr/lpp/mmfs/)
[root@gpfs01 gpfs]# ./Spectrum_Scale_Protocols_Standard-5.0.0.0-x86_64-Linux-install --text-only --silent
Now add a repository for GPFS
[root@gpfs01 yum.repos.d]# cat /etc/yum.repos.d/GPFS.repo
[ganesha_5.0.0.0]
name=ganesha_5.0.0.0
baseurl=file:///usr/lpp/mmfs/5.0.0.0/ganesha_rpms/rhel7
gpgcheck=0
enabled=1
[gpfs_5.0.0.0]
name=gpfs_5.0.0.0
baseurl=file:///usr/lpp/mmfs/5.0.0.0/gpfs_rpms
gpgcheck=0
enabled=1
[zimon_5.0.0.0]
name=zimon_5.0.0.0
baseurl=file:///usr/lpp/mmfs/5.0.0.0/zimon_rpms/rhel7
gpgcheck=0
enabled=1
[smb_5.0.0.0]
name=smb_5.0.0.0
baseurl=file:///usr/lpp/mmfs/5.0.0.0/smb_rpms/rhel7
gpgcheck=0
enabled=1
If you are not on supported Redhat (Ex: CentOS), the use: export LINUX_DISTRIBUTION=REDHAT_AS_LINUX
[root@gpfs01 ~]# yum install gpfs.base gpfs.license.std gpfs.msg.en_US gpfs.docs gpfs.gskit
Required packages:
gpfs.base-5.0.*.rpm
gpfs.gpl-5.0.*.noarch.rpm
gpfs.msg.en_US-5.0.*.noarch.rpm
gpfs.gskit-8.0.50.*.rpm
gpfs.license*.rpm
gpfs.ext-5.0.*.rpm
gpfs.compression-5.0.*.rpm
Also required for compiling gpfs kernel module
[root@gpfs01 ~]# yum -y install kernel-devel cpp gcc gcc-c++ kernel-headers
Compile gpfs kernel module
[root@gpfs01 ~]# export LINUX_DISTRIBUTION=REDHAT_AS_LINUX
[root@gpfs01 ~]# mmbuildgpl
==== Root profile ====
[root@gpfs01 ~]# cat .bashrc
...
######
# Specifics for GPFS testing
######
export PATH=$PATH:$HOME/bin:/usr/lpp/mmfs/bin
export WCOLL=/root/.ssh/wcoll_nodes
[root@gpfs01 ~]# cat /root/.ssh/wcoll_nodes
gpfs01:quorum-manager
gpfs02
Always keep all nodes from cluster synchronized, with same files
==== Initalize the cluster ====
Create the GPFS cluster
[root@gpfs01 ~]# cat /root/gpfs_nodes.txt
gpfs01:quorum-manager
gpfs02:quorum-manager
Basic cluster without CCR
[root@gpfs01 ~]# mmcrcluster -N gpfs_nodes.txt --ccr-disable -p gpfs01 -r /usr/bin/ssh -R /usr/bin/scp -C gpfs01.cluster -U GPFS
With CCR
[root@gpfs01 ~]# cat gpfs_nodes.txt
gpfs01:quorum-manager
gpfs02:quorum-manager
[root@gpfs01 ~]# mmcrcluster -N gpfs_nodes.txt --ccr-enable -r /usr/bin/ssh -R /usr/bin/scp -C gpfs01.cluster -U GPFS
[root@gpfs01 ~]# mmchlicense server --accept -N gpfs01,gpfs02
List cluster config
[root@gpfs01 firewalld]# mmlscluster
GPFS cluster information
========================
GPFS cluster name: gpfs01.cluster
GPFS cluster id: xxxxxxxxxxx
GPFS UID domain: GPFS
Remote shell command: /usr/bin/ssh
Remote file copy command: /usr/bin/scp
Repository type: CCR
Node Daemon node name IP address Admin node name Designation
------------------------------------------------------------------
1 gpfs01 10.x.x.x gpfs01 quorum-manager
2 gpfs02 10.x.x.x gpfs02 quorum-manager
==== Create NSD devices ====
NSD (network shared devices), are disk used on direct attach to servers.
* usage can be **dataAndMetadata, dataOnly, metadataOnly, descOnly**
* each NSD is associated to a pool, disks with same characteristics, default is SYSTEM (if metadataOnly can only belong to SYSTEM
* failure group allows to add mutiple copies of datas (up to 3)
[root@gpfs01 ~]# cat gpfs_disks.txt
%nsd:
device=/dev/dm-11
nsd=GPFS_NSD_DATA01
servers=gpfs01,gpfs02
usage=dataAndMetadata
failureGroup=1
pool=system
%nsd:
device=/dev/dm-10
nsd=GPFS_NSD_DATA02
servers=gpfs01,gpfs02
usage=dataOnly
failureGroup=1
pool=slow
%nsd:
device=/dev/dm-12
nsd=GPFS_NSD_DATA03
servers=gpfs01,gpfs02
usage=dataAndMetadata
failureGroup=2
pool=system
Create the NSD definition, you can force using (-v no) if you want to erase previously used disks
[root@gpfs01 ~]# mmcrnsd -F gpfs_disks.txt
List physical mapping of NSD (NSD with disks association)
[root@gpfs01 ~]# mmlsnsd -X
Disk name NSD volume ID Device Devtype Node name Remarks
---------------------------------------------------------------------------------------------------
GPFS_NSD_DATA01 0A0113xxxxxF0FC1 /dev/dm-11 dmm gpfs01 server node
GPFS_NSD_DATA01 0A0113xxxxxF0FC1 /dev/dm-8 dmm gpfs02
GPFS_NSD_DATA02 0A0113xxxxxF0FC2 /dev/dm-10 dmm gpfs01 server node
GPFS_NSD_DATA02 0A0113xxxxxF0FC2 /dev/dm-5 dmm gpfs02
GPFS_NSD_DATA03 0A0113xxxxxF0FC3 /dev/dm-12 dmm gpfs01 server node
GPFS_NSD_DATA03 0A0113xxxxxF0FC3 /dev/dm-3 dmm gpfs02
List filesystem association with NSD (here free)
[root@gpfs01 ~]# mmlsnsd
File system Disk name NSD servers
---------------------------------------------------------------------------
(free disk) GPFS_NSD_DATA01 gpfs01,gpfs02
(free disk) GPFS_NSD_DATA02 gpfs01,gpfs02
(free disk) GPFS_NSD_DATA03 gpfs01,gpfs02
==== Create Filesystem ====
First start GPFS service if not started (-a all nodes)
[root@gpfs01 ~]# mmstartup -a
[root@gpfs01 ~]# mmgetstate -aLs
Node number Node name Quorum Nodes up Total nodes GPFS state Remarks
-------------------------------------------------------------------------------------
1 gpfs01 2 2 2 active quorum node
2 gpfs02 2 2 2 active quorum node
Each filesystem is created on one or more NSD.
gpfs_disks01.txt has the same format as gpfs_disks.txt, but put only NSD for the filesystem gpfs01
[root@gpfs01 scripts]# mmcrfs gpfs01lv -F gpfs_disks01.txt -T /gpfs01 -k nfs4 -D nfs4 -m1 -M2 -r1 -R2
The following disks of gpfs01lv will be formatted on node gpfs01:
GPFS_NSD_DATA01: size 102400 MB
GPFS_NSD_DATA02: size 102400 MB
GPFS_NSD_DATA03: size 102400 MB
Formatting file system ...
Disks up to size 1.5 TB can be added to storage pool system.
Creating Inode File
Creating Allocation Maps
Creating Log Files
Clearing Inode Allocation Map
Clearing Block Allocation Map
Formatting Allocation Map for storage pool system
Completed creation of file system /dev/gpfs01lv.
mmcrfs: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
[root@gpfs01 scripts]# mmlsfs gpfs01lv
flag value description
------------------- ------------------------ -----------------------------------
-f 8192 Minimum fragment (subblock) size in bytes
-i 4096 Inode size in bytes
-I 32768 Indirect block size in bytes
-m 1 Default number of metadata replicas
-M 2 Maximum number of metadata replicas
-r 1 Default number of data replicas
-R 2 Maximum number of data replicas
-j scatter Block allocation type
-D nfs4 File locking semantics in effect
-k nfs4 ACL semantics in effect
-n 32 Estimated number of nodes that will mount file system
-B 4194304 Block size
-Q none Quotas accounting enabled
none Quotas enforced
none Default quotas enabled
--perfileset-quota No Per-fileset quota enforcement
--filesetdf No Fileset df enabled?
-V 18.00 (5.0.0.0) File system version
--create-time Tue Apr 24 15:19:36 2018 File system creation time
-z No Is DMAPI enabled?
-L 33554432 Logfile size
-E Yes Exact mtime mount option
-S relatime Suppress atime mount option
-K whenpossible Strict replica allocation option
--fastea Yes Fast external attributes enabled?
--encryption No Encryption enabled?
--inode-limit 656384 Maximum number of inodes
--log-replicas 0 Number of log replicas
--is4KAligned Yes is4KAligned?
--rapid-repair Yes rapidRepair enabled?
--write-cache-threshold 0 HAWC Threshold (max 65536)
--subblocks-per-full-block 512 Number of subblocks per full block
-P system Disk storage pools in file system
--file-audit-log No File Audit Logging enabled?
-d GPFS_NSD_DATA01;GPFS_NSD_DATA02;GPFS_NSD_DATA03 Disks in file system
-A yes Automatic mount option
-o none Additional mount options
-T /gpfs01 Default mount point
--mount-priority 0 Mount priority
[root@gpfs01 scripts]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/rhel_gpfs01-root 20961280 5113632 15847648 25% /
devtmpfs 8066056 0 8066056 0% /dev
tmpfs 8082324 0 8082324 0% /dev/shm
tmpfs 8082324 9696 8072628 1% /run
tmpfs 8082324 0 8082324 0% /sys/fs/cgroup
/dev/sda2 2086912 175484 1911428 9% /boot
/dev/sda1 2093048 9964 2083084 1% /boot/efi
/dev/mapper/rhel_gpfs01-software 31441920 5624628 25817292 18% /software
/dev/mapper/rhel_gpfs01-var 20961280 904832 20056448 5% /var
/dev/mapper/rhel_gpfs01-home 20961280 37836 20923444 1% /home
tmpfs 1616468 12 1616456 1% /run/user/42
tmpfs 1616468 0 1616468 0% /run/user/0
Mount the filesystem on all nodes
[root@gpfs01 scripts]# mmmount gpfs01lv -a
[root@gpfs01 scripts]# mmdf gpfs01lv
disk disk size failure holds holds free KB free KB
name in KB group metadata data in full blocks in fragments
--------------- ------------- -------- -------- ----- -------------------- -------------------
Disks in storage pool: system (Maximum disk size allowed is 1.5 TB)
GPFS_NSD_DATA01 104857600 -1 Yes Yes 104460288 (100%) 11896 ( 0%)
GPFS_NSD_DATA02 104857600 -1 Yes Yes 104464384 (100%) 11896 ( 0%)
GPFS_NSD_DATA03 104857600 -1 Yes Yes 104460288 (100%) 11896 ( 0%)
...
------------- -------------------- -------------------
(pool total) 671088640 667193344 ( 99%) 119056 ( 0%)
============= ==================== ===================
(total) 671088640 667193344 ( 99%) 119056 ( 0%)
Inode Information
-----------------
Number of used inodes: 4038
Number of free inodes: 497722
Number of allocated inodes: 501760
Maximum number of inodes: 656384
[root@gpfs01 ~]# mmlsfs all -d
File system attributes for /dev/gpfs01:
======================================
flag value description
------------------- ------------------------ -----------------------------------
-d GPFS_NSD_DATA01;GPFS_NSD_DATA02;GPFS_NSD_DATA03 Disks in file system
[root@gpfs01 ~]# mmmount all
Fri Sep 30 16:16:04 CEST 2016: mmmount: Mounting file systems ...
Other examples
[root@gpfs01 ~]# mmlsdisk /dev/gpfs1
disk driver sector failure holds holds storage
name type size group metadata data status availability pool
------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------
nsd_t1_h nsd 512 1 Yes Yes ready up system
nsd_t2_h nsd 512 1 No Yes ready up slow
nsd_t1_r nsd 512 2 Yes Yes ready up system
nsd_t2_r nsd 512 2 No Yes ready up slow
[root@rhlabh1 ~]# mmlsdisk /dev/gpfs1 -M
Disk name IO performed on node Device Availability
------------ ----------------------- ----------------- ------------
nsd_t1_h localhost /dev/dm-11 up
nsd_t2_h localhost /dev/dm-10 up
nsd_t1_r localhost /dev/dm-12 up
nsd_t2_r localhost /dev/dm-13 up
[root@rhlabh1 ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
...
gpfs1 125829120 3387392 122441728 3% /gpfs1
[root@rhlabh1 ~]# mmdf /dev/gpfs1
disk disk size failure holds holds free KB free KB
name in KB group metadata data in full blocks in fragments
--------------- ------------- -------- -------- ----- -------------------- -------------------
Disks in storage pool: system (Maximum disk size allowed is 261 GB)
nsd_t1_h 31457280 1 Yes Yes 29829632 ( 95%) 584 ( 0%)
nsd_t1_r 31457280 2 Yes Yes 29829632 ( 95%) 584 ( 0%)
------------- -------------------- -------------------
(pool total) 62914560 59659264 ( 95%) 1168 ( 0%)
Disks in storage pool: slow (Maximum disk size allowed is 261 GB)
nsd_t2_h 31457280 1 No Yes 31391232 (100%) 376 ( 0%)
nsd_t2_r 31457280 2 No Yes 31391232 (100%) 376 ( 0%)
------------- -------------------- -------------------
(pool total) 62914560 62782464 (100%) 752 ( 0%)
============= ==================== ===================
(data) 125829120 122441728 ( 97%) 1920 ( 0%)
(metadata) 62914560 59659264 ( 95%) 1168 ( 0%)
============= ==================== ===================
(total) 125829120 122441728 ( 97%) 1920 ( 0%)
Inode Information
-----------------
Number of used inodes: 4038
Number of free inodes: 118970
Number of allocated inodes: 123008
Maximum number of inodes: 123008
If only 2 nodes, you 'll need tiebreaker or quorum disks (odd number). This operation must be done **offline**
[root@gpfs01 scripts]# mmchconfig tiebreakerDisks="GPFS_NSD_DATA01;GPFS_NSD_DATA02;GPFS_NSD_DATA03"
mmchconfig: Command successfully completed
mmchconfig: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.