This is an old revision of the document!
Are you in the best practice for the server ?
The servermon component is automatically installed and configured as part of the IBM Spectrum Protect Version 8.1.7 server installation.
Servermon logs are located in $HOMEDIR of instance into srvmon directory
[root@tsm01]/isp01/srvmon # ls -lsa 0 drwx------ 5 isptest1 ispsrv 55 Jan 14 00:00 .20210114T0000-ISPTEST1 40 -rw-r----- 1 isptest1 ispsrv 38012 Jan 7 14:37 commands.ini 4 -rw-r----- 1 isptest1 ispsrv 11 Jan 7 14:59 lock 4 -rw-r----- 1 isptest1 ispsrv 2505 Sep 5 2019 servermon.ini 12 -rw------- 1 isptest1 ispsrv 9436 Jan 14 16:40 servermon.log 20 -rw------- 1 isptest1 ispsrv 18139 Jan 14 00:00 servermon.log.1 20 -rw------- 1 isptest1 ispsrv 18071 Jan 13 00:00 servermon.log.2 264 -rw-r----- 1 isptest1 ispsrv 268380 Jan 14 16:41 srvmon_10min_done.txt 248 -rw-r----- 1 isptest1 ispsrv 252280 Jan 14 16:40 srvmon_20min_done.txt
The example file for analysing is located into:
/isp01/srvmon/.20210114T0000-ISP01/results/20210114T1629-00000099-20min-show.txt
The servermon data has been reviewed, and from the source side (TSM01), the bottle neck is Network Send. The following performance data shows that “Network Send” uses 756.937 seconds of a total of 802.980, this is about 94% of the time is spent trying to send data over the network:
Thread 480789 SdProtectBatchWork parent=480767 05:40:47.048-->05:54:10.028 Operation Count Tottime Avgtime Mintime Maxtime InstTput Total KB ---------------------------------------------------------------------------- Disk Read 49149 17.090 0.000 0.000 0.108 46823.4 800216 Data Copy 27872 0.069 0.000 0.000 0.001 6008771.4 418559 Network Recv 533 23.410 0.044 0.000 1.540 147.2 3447 Network Send 95494 756.937 0.008 0.000 5.778 1061.6 803554 SSL Receive 822 0.010 0.000 0.000 0.000 333494.9 3440 SSL Send 190982 1.657 0.000 0.000 0.000 483221.6 800849 DB2 Fetch Exec 28 0.008 0.000 0.000 0.000 DB2 Inser Exec 55902 3.272 0.000 0.000 0.006 DB2 Fetch 28 0.000 0.000 0.000 0.000 DB2 Commit 28 0.051 0.002 0.000 0.011 Unknown 0.471 ---------------------------------------------------------------------------- Total 802.980 3524.5 2830067
Basically the above data tells us that the source server is spending most the time waiting on the target server (TSM02). Looking at the target side (TSM02), the data indicates that 388.169 seconds of a total of 417.728, is spent in Thread Wait. Meaning 92% of the time is spent in “Thread Wait”.
Thread 42064 psSessionThread parent=408 05:39:43.639-->05:46:41.368 Operation Count Tottime Avgtime Mintime Maxtime InstTput Total KB ---------------------------------------------------------------------------- Disk Write 28 1.334 0.048 0.001 0.157 4132.8 5516 Disk Commit 43 1.248 0.029 0.000 0.099 Data Copy 2 0.000 0.000 0.000 0.000 26 Network Recv 96003 8.632 0.000 0.000 5.131 66225.3 571658 Network Send 172 0.005 0.000 0.000 0.000 458119.7 2412 SSL Receive 320737 1.740 0.000 0.000 0.002 327374.0 569741 SSL Send 344 0.005 0.000 0.000 0.000 403588.2 2407 DB2 Fetch Prep 51 0.010 0.000 0.000 0.000 DB2 Fetch Exec 119 0.019 0.000 0.000 0.000 DB2 Inser Exec 65964 4.752 0.000 0.000 0.010 DB2 Updat Exec 57 0.008 0.000 0.000 0.000 DB2 Fetch 119 0.000 0.000 0.000 0.000 DB2 Commit 95 0.391 0.004 0.000 0.051 DB2 Reg Prep 29 0.009 0.000 0.000 0.003 DB2 Reg Exec 64133 10.159 0.000 0.000 0.076 DB2 Reg Fetch 64113 0.168 0.000 0.000 0.000 Thread Wait 6754 388.169 0.057 0.000 4.485 Unknown 1.072 ---------------------------------------------------------------------------- Total 417.728 2757.2 1151763
The “Thread Wait” means that most the time on the target server is waiting for downstream AsyncWriteThreads to write data to disk. The instrumentation disk write performance reports 4.1MB/second, and this seems to be a very poor rate for disk writes.
The activity log reports the PROTECT STGPOOL process is using MAXSESSion=10, so first try increasing this to improve performance. Here is my recommendation:
If this does not show any improvement then the primary issue will be the disk write performance on target server TSM02.
QUERY REPLFAILURES > qreplfail.txt