User Tools

Site Tools


gpfs:gpfs_healthcheck

GPFS health

mmhealth thresholds (+GUI +REST) monitors the usage:

$ mmhealth thresholds list
### Threshold Rules ###
rule_name metric                error  warn    direction filterBy groupBy sensitivity
----------------------------------------------------------------------------------------------------------------------------
InodeCapUtil_Rule Fileset_inode 90.0   80.0    high      gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name 300
DataCapUtil_Rule DataPool_capUtil 97.0   90.0    high      gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300
..

List all events

[root@prscale-a-01 ~]# mmhealth node eventlog
2021-06-09 14:47:43.228189 CEST       unmounted_fs_check        WARNING    The filesystem gpfs02lv is probably needed, but not mounted
2021-06-09 14:47:48.256313 CEST       disk_found                INFO       The disk GPFS_NSD_A_D1_0001 was found
2021-06-09 14:52:28.475693 CEST       fs_remount_mount          INFO       The filesystem gpfs02lv was mounted normal

Clear the events:

[root@prscale-a-01 ~]# mmhealth node eventlog --clear

Clear ALL messages on the Web (GUI) interface

[root@prscale-a-01 ~]# /usr/lpp/mmfs/gui/cli/lshealth --reset

Already resolved errors that continue to be displayed in mmhealth and the GUI:

How to remove them (and this annoying TIPS):
mmdsh -N <NODE or all> mmsysmonc clearDB
mmdsh -N <NODE or all> mmsysmoncontrol restart
mmhealth event hide <EventName> 

Check cluster health

[root@prscale-a-01 ~]# mmhealth cluster show

Component           Total         Failed       Degraded        Healthy          Other
----------------------------------------------------------------------------------------------------------------
NODE                    3              0              0              0              3
GPFS                    3              0              0              0              3
NETWORK                 3              0              0              3              0
FILESYSTEM              3              0              0              3              0
DISK                   21              0              0             21              0
CES                     2              0              0              2              0
CESIP                   1              0              0              1              0
FILESYSMGR              2              0              0              2              0
GUI                     2              0              0              2              0
PERFMON                 3              0              0              3              0
THRESHOLD               3              0              0              3              0

More compact

[root@prscale-a-01 ~]# mmhealth cluster show -Y
mmhealth:Summary:HEADER:version:reserved:reserved:component:entityname:total:failed:degraded:healthy:other:
mmhealth:Summary:0:1:::NODE:NODE:3:0:0:0:3:
mmhealth:Summary:0:1:::GPFS:GPFS:3:0:0:0:3:
mmhealth:Summary:0:1:::NETWORK:NETWORK:3:0:0:3:0:
mmhealth:Summary:0:1:::FILESYSTEM:FILESYSTEM:3:0:0:3:0:
mmhealth:Summary:0:1:::DISK:DISK:43:0:0:43:0:
mmhealth:Summary:0:1:::CES:CES:2:0:0:2:0:
mmhealth:Summary:0:1:::CESIP:CESIP:1:0:0:1:0:
mmhealth:Summary:0:1:::CLOUDGATEWAY:CLOUDGATEWAY:2:0:0:2:0:
mmhealth:Summary:0:1:::FILESYSMGR:FILESYSMGR:2:0:0:2:0:
mmhealth:Summary:0:1:::GUI:GUI:2:0:0:2:0:
mmhealth:Summary:0:1:::PERFMON:PERFMON:3:0:0:3:0:
mmhealth:Summary:0:1:::THRESHOLD:THRESHOLD:3:0:0:3:0:

Per node

[root@prscale-a-01 ~]# mmhealth node show

Node name:      prscale-a-01
Node status:    TIPS
Status Change:  2 days ago

Component        Status        Status Change     Reasons
----------------------------------------------------------------------------------------------------------------------------------
GPFS             TIPS          2 days ago        callhome_not_enabled, gpfs_maxfilestocache_small, total_memory_small
NETWORK          HEALTHY       4 days ago        -
FILESYSTEM       HEALTHY       4 days ago        -
DISK             HEALTHY       4 days ago        -
CES              HEALTHY       4 days ago        -
CESIP            HEALTHY       4 days ago        -
CLOUDGATEWAY     HEALTHY       4 days ago        -
FILESYSMGR       HEALTHY       4 days ago        -
GUI              HEALTHY       4 days ago        -
PERFMON          HEALTHY       4 days ago        -
THRESHOLD        HEALTHY       4 days ago        -
[root@prscale-a-01 ~]#  mmhealth node show -y
Option with missing argument

Additional messages:
  invalid option:  -y
[root@prscale-a-01 ~]#  mmhealth node show -Y
mmhealth:Event:HEADER:version:reserved:reserved:node:component:entityname:entitytype:event:arguments:activesince:identifier:ishidden:
mmhealth:State:HEADER:version:reserved:reserved:node:component:entityname:entitytype:status:laststatuschange:
mmhealth:State:0:1:::prscale-a-01:NODE:prscale-a-01:NODE:TIPS:2021-10-03 03%3A55%3A43.152676 CEST:
mmhealth:State:0:1:::prscale-a-01:CES:prscale-a-01:NODE:HEALTHY:2021-10-01 09%3A50%3A34.180949 CEST:
mmhealth:State:0:1:::prscale-a-01:BLOCK:prscale-a-01:NODE:DISABLED:2021-10-01 09%3A23%3A59.486199 CEST:
mmhealth:State:0:1:::prscale-a-01:NFS:prscale-a-01:NODE:HEALTHY:2021-10-01 09%3A50%3A34.180949 CEST:
mmhealth:State:0:1:::prscale-a-01:AUTH_OBJ:prscale-a-01:NODE:DISABLED:2021-10-01 09%3A24%3A17.174618 CEST:
mmhealth:State:0:1:::prscale-a-01:CESNETWORK:prscale-a-01:NODE:HEALTHY:2021-10-01 09%3A49%3A03.938605 CEST:
mmhealth:State:0:1:::prscale-a-01:CESNETWORK:ens192:NIC:HEALTHY:2021-10-01 09%3A24%3A05.346180 CEST:
mmhealth:State:0:1:::prscale-a-01:OBJECT:prscale-a-01:NODE:DISABLED:2021-10-01 09%3A24%3A05.421280 CEST:
mmhealth:State:0:1:::prscale-a-01:SMB:prscale-a-01:NODE:HEALTHY:2021-10-01 09%3A48%3A18.924751 CEST:
mmhealth:State:0:1:::prscale-a-01:AUTH:prscale-a-01:NODE:DISABLED:2021-10-01 09%3A24%3A02.017860 CEST:
mmhealth:State:0:1:::prscale-a-01:HDFS_NAMENODE:prscale-a-01:NODE:DISABLED:2021-10-01 09%3A24%3A17.295829 CEST:
mmhealth:State:0:1:::prscale-a-01:CLOUDGATEWAY:prscale-a-01:NODE:HEALTHY:2021-10-01 09%3A25%3A08.337661 CEST:
mmhealth:State:0:1:::prscale-a-01:CLOUDGATEWAY:tct_tiering1-vault_backup_01:TCT_SERVICE:HEALTHY:2021-10-01 09%3A24%3A33.013963 CEST:
mmhealth:State:0:1:::prscale-a-01:CLOUDGATEWAY:vault_backup_01/:TCT_CSAP:HEALTHY:2021-10-01 09%3A24%3A33.023072 CEST:
mmhealth:State:0:1:::prscale-a-01:CESIP:prscale-a-01:NODE:HEALTHY:2021-10-01 09%3A35%3A32.969821 CEST:
mmhealth:State:0:1:::prscale-a-01:DISK:prscale-a-01:NODE:HEALTHY:2021-10-01 09%3A52%3A49.707704 CEST:
mmhealth:State:0:1:::prscale-a-01:DISK:GPFS_NSD_A_CES_0001:NSD:HEALTHY:2021-10-01 09%3A52%3A49.715357 CEST:
mmhealth:State:0:1:::prscale-a-01:DISK:GPFS_NSD_M_A_0001:NSD:HEALTHY:2021-10-01 09%3A24%3A14.576509 CEST:
mmhealth:State:0:1:::prscale-a-01:DISK:GPFS_NSD_M_A_0002:NSD:HEALTHY:2021-10-01 09%3A24%3A14.596580 CEST:
mmhealth:State:0:1:::prscale-a-01:DISK:GPFS_NSD_M_A_0003:NSD:HEALTHY:2021-10-01 09%3A24%3A14.618590 CEST:
mmhealth:State:0:1:::prscale-a-01:DISK:GPFS_NSD_D_A_0001:NSD:HEALTHY:2021-10-01 09%3A24%3A14.637984 CEST:
mmhealth:State:0:1:::prscale-a-01:DISK:GPFS_NSD_M_A_0004:NSD:HEALTHY:2021-10-01 09%3A24%3A14.656978 CEST:
mmhealth:State:0:1:::prscale-a-01:DISK:GPFS_NSD_D_A_0002:NSD:HEALTHY:2021-10-01 09%3A24%3A14.673522 CEST:
mmhealth:State:0:1:::prscale-a-01:DISK:GPFS_NSD_D_A_0003:NSD:HEALTHY:2021-10-01 09%3A24%3A14.692509 CEST:
mmhealth:State:0:1:::prscale-a-01:DISK:GPFS_NSD_D_A_0004:NSD:HEALTHY:2021-10-01 09%3A24%3A14.711926 CEST:
mmhealth:State:0:1:::prscale-a-01:DISK:GPFS_NSD_D_A_0005:NSD:HEALTHY:2021-10-01 09%3A24%3A14.742578 CEST:
mmhealth:State:0:1:::prscale-a-01:DISK:GPFS_NSD_D_A_0006:NSD:HEALTHY:2021-10-01 09%3A24%3A14.761914 CEST:
mmhealth:State:0:1:::prscale-a-01:DISK:GPFS_NSD_D_A_0007:NSD:HEALTHY:2021-10-01 09%3A24%3A14.788854 CEST:
mmhealth:State:0:1:::prscale-a-01:DISK:GPFS_NSD_D_A_0008:NSD:HEALTHY:2021-10-01 09%3A24%3A14.808564 CEST:
mmhealth:State:0:1:::prscale-a-01:DISK:GPFS_NSD_D_A_0009:NSD:HEALTHY:2021-10-01 09%3A24%3A14.830882 CEST:
mmhealth:State:0:1:::prscale-a-01:DISK:GPFS_NSD_D_A_0010:NSD:HEALTHY:2021-10-01 09%3A24%3A14.852833 CEST:
mmhealth:State:0:1:::prscale-a-01:DISK:GPFS_NSD_D_A_0011:NSD:HEALTHY:2021-10-01 09%3A24%3A14.876191 CEST:
mmhealth:State:0:1:::prscale-a-01:DISK:GPFS_NSD_D_A_0012:NSD:HEALTHY:2021-10-01 09%3A24%3A14.888040 CEST:
mmhealth:State:0:1:::prscale-a-01:DISK:GPFS03_NSD_M_A_0005:NSD:HEALTHY:2021-10-01 09%3A24%3A14.915370 CEST:
mmhealth:State:0:1:::prscale-a-01:DISK:GPFS03_NSD_M_A_0006:NSD:HEALTHY:2021-10-01 09%3A24%3A14.931229 CEST:
mmhealth:State:0:1:::prscale-a-01:DISK:GPFS03_NSD_M_A_0007:NSD:HEALTHY:2021-10-01 09%3A24%3A14.942819 CEST:
mmhealth:State:0:1:::prscale-a-01:DISK:GPFS03_NSD_M_A_0008:NSD:HEALTHY:2021-10-01 09%3A24%3A14.956576 CEST:
mmhealth:State:0:1:::prscale-a-01:DISK:GPFS03_NSD_M_A_0009:NSD:HEALTHY:2021-10-01 09%3A24%3A14.970095 CEST:
mmhealth:State:0:1:::prscale-a-01:DISK:GPFS03_NSD_M_A_0010:NSD:HEALTHY:2021-10-01 09%3A24%3A14.993296 CEST:
mmhealth:State:0:1:::prscale-a-01:DISK:GPFS_NSD_A_D1_0001:NSD:HEALTHY:2021-10-01 09%3A24%3A15.019350 CEST:
mmhealth:State:0:1:::prscale-a-01:GUI:prscale-a-01:NODE:HEALTHY:2021-10-01 09%3A53%3A32.099884 CEST:
mmhealth:State:0:1:::prscale-a-01:THRESHOLD:prscale-a-01:NODE:HEALTHY:2021-10-01 09%3A24%3A01.090924 CEST:
mmhealth:State:0:1:::prscale-a-01:THRESHOLD:MemFree_Rule:THRESHOLD_RULE:HEALTHY:2021-10-01 09%3A24%3A32.160416 CEST:
mmhealth:State:0:1:::prscale-a-01:THRESHOLD:SMBConnPerNode_Rule:THRESHOLD_RULE:HEALTHY:2021-10-01 09%3A29%3A32.474178 CEST:
mmhealth:State:0:1:::prscale-a-01:THRESHOLD:active_thresh_monitor:THRESHOLD_MONITOR:HEALTHY:2021-10-01 09%3A35%3A32.982353 CEST:
mmhealth:State:0:1:::prscale-a-01:THRESHOLD:SMBConnTotal_Rule:THRESHOLD_RULE:HEALTHY:2021-10-01 09%3A38%3A48.166426 CEST:
mmhealth:State:0:1:::prscale-a-01:PERFMON:prscale-a-01:NODE:HEALTHY:2021-10-01 09%3A24%3A17.345148 CEST:
mmhealth:State:0:1:::prscale-a-01:FILESYSTEM:prscale-a-01:NODE:HEALTHY:2021-10-01 09%3A48%3A17.866719 CEST:
mmhealth:State:0:1:::prscale-a-01:FILESYSTEM:cesSharedRootlv:FILESYSTEM:HEALTHY:2021-10-01 09%3A48%3A17.879658 CEST:
mmhealth:State:0:1:::prscale-a-01:FILESYSTEM:gpfs01lv:FILESYSTEM:HEALTHY:2021-10-01 09%3A48%3A17.893977 CEST:
mmhealth:State:0:1:::prscale-a-01:FILESYSTEM:gpfs02lv:FILESYSTEM:HEALTHY:2021-10-01 09%3A24%3A17.503234 CEST:
mmhealth:State:0:1:::prscale-a-01:GPFS:prscale-a-01:NODE:TIPS:2021-10-03 03%3A55%3A43.146453 CEST:
mmhealth:Event:0:1:::prscale-a-01:GPFS:prscale-a-01:NODE:callhome_not_enabled::2021-10-01 09%3A24%3A16.102766 CEST::no:
mmhealth:Event:0:1:::prscale-a-01:GPFS:prscale-a-01:NODE:gpfs_maxfilestocache_small::2021-10-01 09%3A24%3A16.139998 CEST::no:
mmhealth:Event:0:1:::prscale-a-01:GPFS:prscale-a-01:NODE:total_memory_small::2021-10-01 09%3A24%3A16.169878 CEST::no:
mmhealth:State:0:1:::prscale-a-01:NETWORK:prscale-a-01:NODE:HEALTHY:2021-10-01 09%3A24%3A17.308930 CEST:
mmhealth:State:0:1:::prscale-a-01:NETWORK:ens192:NIC:HEALTHY:2021-10-01 09%3A24%3A17.323203 CEST:
mmhealth:State:0:1:::prscale-a-01:FILESYSMGR:prscale-a-01:NODE:HEALTHY:2021-10-01 09%3A24%3A19.114252 CEST:
mmhealth:State:0:1:::prscale-a-01:FILESYSMGR:gpfs02lv:FILESYSTEMMGMT:HEALTHY:2021-10-01 09%3A24%3A19.137516 CEST:

Check Protocols components

[root@prscale-a-01 ~]#  mmces state show -a -Y
mmces:stateShow:HEADER:version:reserved:reserved:NODE:AUTH:BLOCK:NETWORK:HDFS_NAMENODE:AUTH_OBJ:NFS:OBJ:SMB:CES:
mmces:stateShow:0:1:::prscale-a-01:DISABLED:DISABLED:HEALTHY:DISABLED:DISABLED:HEALTHY:DISABLED:HEALTHY:HEALTHY:
mmces:stateShow:0:1:::prscale-b-01:DISABLED:DISABLED:HEALTHY:DISABLED:DISABLED:HEALTHY:DISABLED:HEALTHY:HEALTHY:

Check particular event

[root@prscale-a-02 ~]# mmhealth cluster show

Component           Total         Failed       Degraded        Healthy          Other
-------------------------------------------------------------------------------------
NODE                    3              0              0              0              3
GPFS                    3              0              0              0              3
NETWORK                 3              0              0              3              0
FILESYSTEM              3              0              0              3              0
DISK                   31              0              0             31              0
CES                     2              0              0              2              0
CESIP                   1              0              0              1              0
FILESYSMGR              1              0              0              1              0
GUI                     3              0              1              2              0
PERFMON                 3              0              0              3              0
THRESHOLD               3              0              0              3              0
mmhealth cluster show [ NODE | GPFS | NETWORK [ UserDefinedSubComponent ] 
                     | FILESYSTEM  [UserDefinedSubComponent ]| DISK [UserDefinedSubComponent ]
                     | CES |AUTH | AUTH_OBJ | BLOCK | CESNETWORK | NFS | OBJECT | SMB 
                     | HADOOP |CLOUDGATEWAY | GUI | PERFMON | THRESHOLD
                     | AFM [UserDefinedSubComponent]  ]
                     [-Y] [--verbose]
[root@prscale-a-02 ~]# mmhealth cluster show GUI

Component     Node                               Status            Reasons
------------------------------------------------------------------------------------------
GUI           prscale-q-b-01                     HEALTHY           -
GUI           prscale-a-02                       HEALTHY           -
GUI           prscale-a-01                       DEGRADED          gui_refresh_task_failed
[root@prscale-a-01 ~]# /usr/lpp/mmfs/gui/cli/runtask help --debug
[AFM_FILESET_STATE, AFM_NODE_MAPPING, ALTER_HOST_NAME, CALLBACK, CALLHOME, CALLHOME_STATUS, CAPACITY_LICENSE, CES_ADDRESS, CES_STATE, CES_SERVICE_STATE, CES_USER_AUTH_SERVICE, CLUSTER_CONFIG, CONNECTION_STATUS, DAEMON_CONFIGURATION, DF, DISK_USAGE, DISKS, FILESETS, FILESYSTEM_MOUNT, FILESYSTEMS, FILE_AUDIT_LOG_CONFIG, GUI_CONFIG_CHECK, GPFS_JOBS, DIGEST_NOTIFICATION_TASK, HEALTH_STATES, HEALTH_TRIGGERED, HOST_STATES, HOST_STATES_CLIENTS, INODES, KEYSTORE, LOG_REMOVER, MASTER_GUI_ELECTION, MOUNT_CONFIG, NFS_EXPORTS, NFS_EXPORTS_DEFAULTS, NFS_SERVICE, NODE_LICENSE, NODECLASS, OBJECT_STORAGE_POLICY, OS_DETECT, PM_MONITOR, PM_SENSORS, PM_TOPOLOGY, POLICIES, QUOTA, QUOTA_DEFAULTS, QUOTA_ID_RESOLVE, QUOTA_MAIL, RDMA_INTERFACES, REMOTE_CONFIG, REMOTE_CLUSTER, REMOTE_FILESETS, REMOTE_GPFS_CONFIG, REMOTE_HEALTH_STATES, SMB_GLOBALS, SMB_SHARES, SNAPSHOTS, SNAPSHOTS_FS_USAGE, SNAPSHOT_MANAGER, SQL_STATISTICS, STATE_MAIL, STORAGE_POOL, SYSTEMUTIL_DF, TCT_ACCOUNT, TCT_CLOUD_SERVICE, TCT_NODECLASS, THRESHOLDS, WATCHFOLDER, WATCHFOLDER_STATUS, TASK_CHAIN]
[root@prscale-a-01 ~]# /usr/lpp/mmfs/gui/cli/runtask CLUSTER_CONFIG --debug
debug: locale=en_US
debug: Running 'mmsdrquery 'sdrq_cluster_info' all ' on node localhost
debug: Running 'mmsdrquery 'sdrq_nsd_info' all ' on node localhost
debug: Running 'mmlscluster -Y ' on node localhost
debug: Running 'mmsdrquery 'sdrq_node_info' all ' on node localhost
debug: Running 'mmlsnodeclass 'GUI_MGMT_SERVERS' -Y ' on node localhost
debug: Running 'mmlsnodeclass 'GUI_SERVERS' -Y ' on node localhost
EFSSG1000I The command completed successfully.
[root@prscale-a-01 ~]# mmhealth event show gui_refresh_task_failed
Event Name:              gui_refresh_task_failed
Event ID:                998254
Description:             One or more GUI refresh tasks failed. This could mean that data in the GUI is outdated.
Cause:                   There can be several reasons.
User Action:             1.) Check if there is additional information available by executing '/usr/lpp/mmfs/gui/cli/lstasklog [taskname]'. 2.) Run the specified task manually on the CLI by executing '/usr/lpp/mmfs/gui/cli/runtask [taskname] --debug'. 3.) Check the GUI logs under /var/log/cnlog/mgtsrv. 4.) Contact IBM Support if this error persists or occurs more often.
Severity:                WARNING
State:                   DEGRADED
[root@prscale-a-01 ~]# mmhealth event resolve 998254
The specified event gui_refresh_task_failed is not manually resolvable.
A.	Objectif

Contrôler l’état d’un filesystem GPFS et rectifier les anomalies

B.	Principales entités concernées

Système

C.	Description générale et flux des données



D.	Définition des termes

Node names
Correspondance GPFS-AIX

GPFS	        AIX
p5-gpfs-h	oragpfh
p5-gpfs-r	oragpfr
p5-gpfs-k	oragpfk



E.	Instructions spécifiques

1)	Affichez les informations

mmlsnsd affiche les filesystemes GPFS connues sur la machine. (peut s’exécuter sur les 3 machines)

root@oragpfh:/home/root>mmlsnsd

 File system   Disk name    Primary node             Backup node           
---------------------------------------------------------------------------
 orafs        DiskR        p5-gpfs-r                p5-gpfs-h
 orafs        DiskH        p5-gpfs-r                p5-gpfs-h
 orafs        DiskK        gpfs-k                   
 orafs2       DiskR2       p5-gpfs-r                p5-gpfs-h
 orafs2       DiskH2       p5-gpfs-r                p5-gpfs-h
 orafs2       DiskK2       gpfs-k                   

root@oragpfh:/home/root>mmlsnsd -L

L’option –L affiche les NSD volume ID

 File system   Disk name    NSD volume ID      Primary node             Backup node          
--------------------------------------------------------------------------------------------
 orafs        DiskR        0A040120452A20B4   p5-gpfs-r                p5-gpfs-h
 orafs        DiskH        0A040120452A20B6   p5-gpfs-r                p5-gpfs-h
 orafs        DiskK        AC13131D48008499   gpfs-k                   
 orafs2       DiskR2       0A04012046827A9E   p5-gpfs-r                p5-gpfs-h
 orafs2       DiskH2       0A04012046827AA0   p5-gpfs-r                p5-gpfs-h
 orafs2       DiskK2       AC13131D480084BD   gpfs-k                   
 

root@oragpfh:/home/root>mmlsnsd -M

L’option –L affiche la correspondance disque GPFS – disque OS/machine 
ex : DiskR = hdisk0 sur p5-gpfs-r (primary node)= hdisk3 sur p5-gpfs-h (backup node)

 Disk name    NSD volume ID      Device         Node name                Remarks       
---------------------------------------------------------------------------------------
 DiskR        0A040120452A20B4   /dev/hdisk0    p5-gpfs-r                primary node
 DiskR        0A040120452A20B4   /dev/hdisk3    p5-gpfs-h                backup node
 DiskH        0A040120452A20B6   /dev/hdisk3    p5-gpfs-r                primary node
 DiskH        0A040120452A20B6   /dev/hdisk0    p5-gpfs-h                backup node
 DiskK        AC13131D48008499   /dev/descgpfslv gpfs-k                   primary node
 DiskR2       0A04012046827A9E   /dev/hdisk4    p5-gpfs-r                primary node
 DiskR2       0A04012046827A9E   /dev/hdisk5    p5-gpfs-h                backup node
 DiskH2       0A04012046827AA0   /dev/hdisk5    p5-gpfs-r                primary node
 DiskH2       0A04012046827AA0   /dev/hdisk4    p5-gpfs-h                backup node
 DiskK2       AC13131D480084BD   /dev/descgpfslv2 gpfs-k                   primary node

mmlsdisk permet d’afficher les disques d’un filesysteme GPFS. (peut s’exécuter sur les 3 machines)

root@oragpfh:/home/root>mmlsdisk orafs
disk         driver   sector failure holds    holds
name         type       size   group metadata data  status        availability
------------ -------- ------ ------- -------- ----- ------------- ------------
DiskR        nsd         512       1 yes      yes   ready         up           
DiskH        nsd         512       2 yes      yes   ready         up           
DiskK        nsd         512       3 no       no    ready         up   

Le statu normal des disques est ready,  la disponibilité est up

En cas d’incohérences, un message d’alerte est affiché après les informations disques !

ex : Attention : Due to an earlier configuration change, the file system
may contain data that is at risk of being lost


Autes commandes :
mmlscluster
root@oragpfh:/home/root>mmlscluster

GPFS cluster information
========================
  GPFS cluster name:         p5-gpfs-r
  GPFS cluster id:           12399281700916488274
  GPFS UID domain:           p5-gpfs-r
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp

GPFS cluster configuration servers:
-----------------------------------
  Primary server:    p5-gpfs-r
  Secondary server:  p5-gpfs-h

 Node number  Node name    IP address       Full node name              Remarks    
-----------------------------------------------------------------------------------
       1      p5-gpfs-h    10.11.10.2      p5-gpfs-h                   quorum node
       2      p5-gpfs-r    10.11.10.3      p5-gpfs-r                   quorum node
       3      gpfs-k       10.11.10.4      gpfs-k                      quorum node

mmlsfs « Fs-name »  			(Fs-name ={orafs|orafs2}
root@oragpfh:/home/root>mmlsfs orafs
flag value          description
---- -------------- -----------------------------------------------------
 -s  roundRobin     Stripe method
 -f  8192           Minimum fragment size in bytes
 -i  512            Inode size in bytes
 -I  16384          Indirect block size in bytes
 -m  2              Default number of metadata replicas
 -M  2              Maximum number of metadata replicas
 -r  2              Default number of data replicas
 -R  2              Maximum number of data replicas
 -j  cluster        Block allocation type
 -D  posix          File locking semantics in effect
 -k  posix          ACL semantics in effect
 -a  1048576        Estimated average file size
 -n  32             Estimated number of nodes that will mount file system
 -B  262144         Block size
 -Q  user;group     Quotas enforced
     none           Default quotas enabled
 -F  185344         Maximum number of inodes
 -V  8.01           File system version. Highest supported version: 8.02
 -u  yes            Support for large LUNs?
 -z  no             Is DMAPI enabled?
 -E  yes            Exact mtime mount option
 -S  no             Suppress atime mount option
 -d  DiskR;DiskH;DiskK  Disks in file system
 -A  yes            Automatic mount option
 -o  none           Additional mount options
 -T  /kora        Default mount point




2)	Réactiver / resynchroniser un disque


Si l’état d’un disque est différent de ready et up, il faut utiliser la commande mmchdisk pour activer le disque et mmrestripefs pour resynchroniser le disque.
Ces commandes mettent un certain temps (jusqu’à 30 minutes). Il est fortement conseiller d’attendre la fin d’une commande avant de lancer la suivante.

Si le disque DiskH est down  mmchdisk orafs start -d DiskH 
Si le disque DiskH est suspended  mmchdisk orafs resume –d DiskH

Si la commande mmlsdisk renseigne des alertes :
Attention : Due to an earlier configuration change, the file system
may contain data that is at risk of being lost
 ==> mmrestripefs “Fs-name” –r –N mount    (Fs-name ={orafs|orafs2})
relancer la commande mmlsdisk “Fs-name” pour conrôler le résultat

Attention : Due to an earlier configuration change, the file system
is no longer properly balanced.
==> mmrestripefs “Fs-name”  -b -N mount    (Fs-name ={orafs|orafs2})
relancer la commande mmlsdisk “Fs-name” pour conrôler le résultat
 
gpfs/gpfs_healthcheck.txt · Last modified: 2022/08/29 23:05 by manu