This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
tsm:tsm_perf [2021/01/14 14:40] manu created |
tsm:tsm_perf [2023/11/06 10:14] (current) manu |
||
---|---|---|---|
Line 3: | Line 3: | ||
Are you in the best practice for the server ? | Are you in the best practice for the server ? | ||
* have a look on Spectum Blueprint (on google) | * have a look on Spectum Blueprint (on google) | ||
+ | |||
+ | ===== Spectrum Protect Server block size considerations ===== | ||
+ | |||
+ | DB block size used: 8kb\\ | ||
+ | Storage pool DISK and FILE: reads and writes to storage pools predominantly in **256 KB** blocks\\ | ||
+ | The extents can range in size from 50 KB to 4 MB with an average of 256 KB. Any data smaller than 2 KB or data that cannot be deduplicated, such as encrypted or compressed, are not deduplicated. | ||
+ | |||
+ | S3: By default, these files are 1 GB in size and are configurable using the parameter | ||
+ | CloudTransferContainerSize server option (i.e. specifying in dsmserv.opt or using the “setopt” server command). These files are transferred to Access using S3 multipart-upload. With 1 GB default file size, the default part size that the file is broken up to is 100 MB. | ||
+ | This value is configurable using the server parameter, CloudMinUploadPartSize. For restores, IBM Spectrum Protect does range reads in smaller sizes of 10 KB – 100KB | ||
===== Check performances on Spectrum Protect Server ===== | ===== Check performances on Spectrum Protect Server ===== | ||
Line 8: | Line 18: | ||
The servermon component is automatically installed and configured as part of the IBM Spectrum Protect Version 8.1.7 server installation. | The servermon component is automatically installed and configured as part of the IBM Spectrum Protect Version 8.1.7 server installation. | ||
+ | Servermon logs are located in $HOMEDIR of instance into srvmon directory | ||
+ | <cli prompt='#'> | ||
+ | [root@tsm01]/isp01/srvmon # ls -lsa | ||
+ | 0 drwx------ 5 isptest1 ispsrv 55 Jan 14 00:00 .20210114T0000-ISPTEST1 | ||
+ | 40 -rw-r----- 1 isptest1 ispsrv 38012 Jan 7 14:37 commands.ini | ||
+ | 4 -rw-r----- 1 isptest1 ispsrv 11 Jan 7 14:59 lock | ||
+ | 4 -rw-r----- 1 isptest1 ispsrv 2505 Sep 5 2019 servermon.ini | ||
+ | 12 -rw------- 1 isptest1 ispsrv 9436 Jan 14 16:40 servermon.log | ||
+ | 20 -rw------- 1 isptest1 ispsrv 18139 Jan 14 00:00 servermon.log.1 | ||
+ | 20 -rw------- 1 isptest1 ispsrv 18071 Jan 13 00:00 servermon.log.2 | ||
+ | 264 -rw-r----- 1 isptest1 ispsrv 268380 Jan 14 16:41 srvmon_10min_done.txt | ||
+ | 248 -rw-r----- 1 isptest1 ispsrv 252280 Jan 14 16:40 srvmon_20min_done.txt | ||
+ | </cli> | ||
+ | |||
+ | The example file for analysing is located into: | ||
+ | /isp01/srvmon/.20210114T0000-ISP01/results/20210114T1629-00000099-20min-show.txt | ||
+ | === Example with disk perf problems with replication === | ||
The servermon data has been reviewed, and from the source side (TSM01), the bottle neck is Network Send. The following performance data shows that "Network Send" uses 756.937 seconds of a total of 802.980, this is about 94% of the time is spent | The servermon data has been reviewed, and from the source side (TSM01), the bottle neck is Network Send. The following performance data shows that "Network Send" uses 756.937 seconds of a total of 802.980, this is about 94% of the time is spent | ||
trying to send data over the network: | trying to send data over the network: | ||
+ | <code> | ||
Thread 480789 SdProtectBatchWork parent=480767 05:40:47.048-->05:54:10.028 | Thread 480789 SdProtectBatchWork parent=480767 05:40:47.048-->05:54:10.028 | ||
Operation Count Tottime Avgtime Mintime Maxtime InstTput Total KB | Operation Count Tottime Avgtime Mintime Maxtime InstTput Total KB | ||
Line 29: | Line 57: | ||
---------------------------------------------------------------------------- | ---------------------------------------------------------------------------- | ||
Total 802.980 3524.5 2830067 | Total 802.980 3524.5 2830067 | ||
+ | </code> | ||
Basically the above data tells us that the source server is spending most the time waiting on the target server (TSM02). Looking at the target side (TSM02), the data indicates that 388.169 seconds of a total of 417.728, is spent in Thread | Basically the above data tells us that the source server is spending most the time waiting on the target server (TSM02). Looking at the target side (TSM02), the data indicates that 388.169 seconds of a total of 417.728, is spent in Thread | ||
Wait. Meaning 92% of the time is spent in "Thread Wait". | Wait. Meaning 92% of the time is spent in "Thread Wait". | ||
+ | <code> | ||
Thread 42064 psSessionThread parent=408 05:39:43.639-->05:46:41.368 | Thread 42064 psSessionThread parent=408 05:39:43.639-->05:46:41.368 | ||
Operation Count Tottime Avgtime Mintime Maxtime InstTput Total KB | Operation Count Tottime Avgtime Mintime Maxtime InstTput Total KB | ||
Line 56: | Line 85: | ||
---------------------------------------------------------------------------- | ---------------------------------------------------------------------------- | ||
Total 417.728 2757.2 1151763 | Total 417.728 2757.2 1151763 | ||
+ | </code> | ||
The "Thread Wait" means that most the time on the target server is waiting for downstream AsyncWriteThreads to write data to disk. The instrumentation disk write performance reports 4.1MB/second, and this seems to be a very poor rate for disk writes. | The "Thread Wait" means that most the time on the target server is waiting for downstream AsyncWriteThreads to write data to disk. The instrumentation disk write performance reports 4.1MB/second, and this seems to be a very poor rate for disk writes. | ||
Line 61: | Line 91: | ||
The activity log reports the PROTECT STGPOOL process is using MAXSESSion=10, so first try increasing this to improve performance. Here is my recommendation: | The activity log reports the PROTECT STGPOOL process is using MAXSESSion=10, so first try increasing this to improve performance. Here is my recommendation: | ||
- | 1. Change the PROTECT STGPOOL to run with MAXSESSion=20. | + | - Change the PROTECT STGPOOL to run with MAXSESSion=20 |
- | 2. Let it run to completion, and then collect the servermon data from both TSM01, and TSM02. | + | - Let it run to completion, and then collect the servermon data from both TSM01, and TSM02. |
If this does not show any improvement then the primary issue will be the disk write performance on target server TSM02. | If this does not show any improvement then the primary issue will be the disk write performance on target server TSM02. | ||
Line 69: | Line 99: | ||
QUERY REPLFAILURES > qreplfail.txt | QUERY REPLFAILURES > qreplfail.txt | ||
+ | |||
+ | ===== Check performances on Spectrum Protect Client ===== | ||
+ | |||
+ | Use the following parameter into dsm.opt or dsm.sys on linux | ||
+ | <cli> | ||
+ | enableinstrumentation yes | ||
+ | </cli> | ||
+ | |||
+ | Or directly using command line: | ||
+ | <cli> | ||
+ | dsmc sel c:\mydir\* -subdir=yes -enableinstrumentation=yes | ||
+ | </cli> | ||
+ | |||
+ | On older version of client, before 8.1 use the -TESTFLAG=instrument:detail,-TESTFLAG=instrument:API, and -TESTFLAG=instrumentation:detail/API options. | ||
+ | |||
+ | **dsminstr.log** file is located in the directory that is specified by the DSM_LOG environment variable. You can also change the file name, location and size using **instrlogname** and **instrlogmax** options | ||
+ | |||
+ | Here is an example of statistics from dsminstr.log | ||
+ | <code> | ||
+ | Detailed Instrumentation statistics for | ||
+ | |||
+ | Thread: 5076 Elapsed time = 510.979 sec | ||
+ | |||
+ | Section Actual(sec) Average(msec) Frequency used | ||
+ | ----------------------------------------------------------------------------------- | ||
+ | Compute 0.218 0.0 27535 | ||
+ | BeginTxn Verb 0.000 0.0 32 | ||
+ | Transaction 0.374 11.7 32 | ||
+ | File I/O 2.668 0.1 20702 | ||
+ | Compression 32.105 1.2 27520 | ||
+ | Data Verb 445.225 64.3 6927 | ||
+ | Confirm Verb 0.000 0.0 1 | ||
+ | EndTxn Verb 0.000 0.0 32 | ||
+ | TCP Read 29.422 198.8 148 | ||
+ | Thread Wait 0.905 904.8 1 | ||
+ | Other 0.062 0.0 0 | ||
+ | |||
+ | ----------------------------------------------------------------------------------- | ||
+ | |||
+ | Detailed Instrumentation statistics for | ||
+ | |||
+ | Thread: 5532 Elapsed time = 438.018 sec | ||
+ | |||
+ | Section Actual(sec) Average(msec) Frequency used | ||
+ | ----------------------------------------------------------------------------------- | ||
+ | Process Dirs 0.140 9.4 15 | ||
+ | Solve Tree 0.000 0.0 1 | ||
+ | Sleep 0.062 62.4 1 | ||
+ | TCP Read 0.546 39.0 14 | ||
+ | Thread Wait 437.206 950.4 460 | ||
+ | Other 0.062 0.0 0 | ||
+ | |||
+ | ----------------------------------------------------------------------------------- | ||
+ | </code> | ||
+ | |||
+ | ==== Categories ==== | ||
+ | |||
+ | ^Category^Activity^ | ||
+ | |Query Server Dirs|Receiving the server inventory directories for incremental backup| | ||
+ | |Query Server Files|Receiving the server inventory files for incremental backup| | ||
+ | |Process Dirs|Scanning for files to back up| | ||
+ | |Cache Examine|Scanning the local disk cache database for files to expire| | ||
+ | |Solve Tree|Determining directory structure| | ||
+ | |Compute|Computing throughput and compression ratio| | ||
+ | |BeginTxn Verb|Building transactions Transaction File open, close, and other miscellaneous operations| | ||
+ | |File IO|File read and write| | ||
+ | |Compression|Compressing and uncompressing data| | ||
+ | |Encryption|Encrypting and decrypting data| | ||
+ | |CRC|Computing and comparing CRC values| | ||
+ | |Data Verb|Sending and receiving data to and from the server (points to the network or IBM Spectrum Protect server)| | ||
+ | |Confirm Verb|Response time during backup for server confirm verb| | ||
+ | |EndTxn Verb|Server transaction commit and tape synchronization (points to the IBM Spectrum Protect server)| | ||
+ | |Other|Everything else that is not tracked already| | ||