User Tools

Site Tools


tsm:tsm_perf

Check performance on client and server / bottleneck

Are you in the best practice for the server ?

  • have a look on Spectum Blueprint (on google)

Spectrum Protect Server block size considerations

DB block size used: 8kb
Storage pool DISK and FILE: reads and writes to storage pools predominantly in 256 KB blocks
The extents can range in size from 50 KB to 4 MB with an average of 256 KB. Any data smaller than 2 KB or data that cannot be deduplicated, such as encrypted or compressed, are not deduplicated.

S3: By default, these files are 1 GB in size and are configurable using the parameter CloudTransferContainerSize server option (i.e. specifying in dsmserv.opt or using the “setopt” server command). These files are transferred to Access using S3 multipart-upload. With 1 GB default file size, the default part size that the file is broken up to is 100 MB. This value is configurable using the server parameter, CloudMinUploadPartSize. For restores, IBM Spectrum Protect does range reads in smaller sizes of 10 KB – 100KB

Check performances on Spectrum Protect Server

The servermon component is automatically installed and configured as part of the IBM Spectrum Protect Version 8.1.7 server installation.

Servermon logs are located in $HOMEDIR of instance into srvmon directory

[root@tsm01]/isp01/srvmon # ls -lsa
  0 drwx------  5 isptest1 ispsrv     55 Jan 14 00:00 .20210114T0000-ISPTEST1
 40 -rw-r-----  1 isptest1 ispsrv  38012 Jan  7 14:37 commands.ini
  4 -rw-r-----  1 isptest1 ispsrv     11 Jan  7 14:59 lock
  4 -rw-r-----  1 isptest1 ispsrv   2505 Sep  5  2019 servermon.ini
 12 -rw-------  1 isptest1 ispsrv   9436 Jan 14 16:40 servermon.log
 20 -rw-------  1 isptest1 ispsrv  18139 Jan 14 00:00 servermon.log.1
 20 -rw-------  1 isptest1 ispsrv  18071 Jan 13 00:00 servermon.log.2
264 -rw-r-----  1 isptest1 ispsrv 268380 Jan 14 16:41 srvmon_10min_done.txt
248 -rw-r-----  1 isptest1 ispsrv 252280 Jan 14 16:40 srvmon_20min_done.txt

The example file for analysing is located into:

/isp01/srvmon/.20210114T0000-ISP01/results/20210114T1629-00000099-20min-show.txt

Example with disk perf problems with replication

The servermon data has been reviewed, and from the source side (TSM01), the bottle neck is Network Send. The following performance data shows that “Network Send” uses 756.937 seconds of a total of 802.980, this is about 94% of the time is spent trying to send data over the network:

Thread 480789 SdProtectBatchWork parent=480767  05:40:47.048-->05:54:10.028
Operation       Count  Tottime  Avgtime  Mintime  Maxtime InstTput  Total KB
----------------------------------------------------------------------------
Disk Read       49149   17.090    0.000    0.000    0.108  46823.4    800216
Data Copy       27872    0.069    0.000    0.000    0.001 6008771.4    418559
Network Recv      533   23.410    0.044    0.000    1.540    147.2      3447
Network Send    95494  756.937    0.008    0.000    5.778   1061.6    803554
SSL Receive       822    0.010    0.000    0.000    0.000 333494.9      3440
SSL Send       190982    1.657    0.000    0.000    0.000 483221.6    800849
DB2 Fetch Exec     28    0.008    0.000    0.000    0.000
DB2 Inser Exec  55902    3.272    0.000    0.000    0.006
DB2 Fetch          28    0.000    0.000    0.000    0.000
DB2 Commit         28    0.051    0.002    0.000    0.011
Unknown                  0.471
----------------------------------------------------------------------------
Total                  802.980                              3524.5   2830067

Basically the above data tells us that the source server is spending most the time waiting on the target server (TSM02). Looking at the target side (TSM02), the data indicates that 388.169 seconds of a total of 417.728, is spent in Thread Wait. Meaning 92% of the time is spent in “Thread Wait”.

Thread 42064 psSessionThread parent=408  05:39:43.639-->05:46:41.368
Operation       Count  Tottime  Avgtime  Mintime  Maxtime InstTput  Total KB
----------------------------------------------------------------------------
Disk Write         28    1.334    0.048    0.001    0.157   4132.8      5516
Disk Commit        43    1.248    0.029    0.000    0.099
Data Copy           2    0.000    0.000    0.000    0.000                 26
Network Recv    96003    8.632    0.000    0.000    5.131  66225.3    571658
Network Send      172    0.005    0.000    0.000    0.000 458119.7      2412
SSL Receive    320737    1.740    0.000    0.000    0.002 327374.0    569741
SSL Send          344    0.005    0.000    0.000    0.000 403588.2      2407
DB2 Fetch Prep     51    0.010    0.000    0.000    0.000
DB2 Fetch Exec    119    0.019    0.000    0.000    0.000
DB2 Inser Exec  65964    4.752    0.000    0.000    0.010
DB2 Updat Exec     57    0.008    0.000    0.000    0.000
DB2 Fetch         119    0.000    0.000    0.000    0.000
DB2 Commit         95    0.391    0.004    0.000    0.051
DB2 Reg Prep       29    0.009    0.000    0.000    0.003
DB2 Reg Exec    64133   10.159    0.000    0.000    0.076
DB2 Reg Fetch   64113    0.168    0.000    0.000    0.000
Thread Wait      6754  388.169    0.057    0.000    4.485
Unknown                  1.072
----------------------------------------------------------------------------
Total                  417.728                              2757.2   1151763

The “Thread Wait” means that most the time on the target server is waiting for downstream AsyncWriteThreads to write data to disk. The instrumentation disk write performance reports 4.1MB/second, and this seems to be a very poor rate for disk writes.

The activity log reports the PROTECT STGPOOL process is using MAXSESSion=10, so first try increasing this to improve performance. Here is my recommendation:

  1. Change the PROTECT STGPOOL to run with MAXSESSion=20
  2. Let it run to completion, and then collect the servermon data from both TSM01, and TSM02.

If this does not show any improvement then the primary issue will be the disk write performance on target server TSM02.

Detect failure on replication

QUERY REPLFAILURES > qreplfail.txt

Check performances on Spectrum Protect Client

Use the following parameter into dsm.opt or dsm.sys on linux

enableinstrumentation yes

Or directly using command line:

dsmc sel c:\mydir\* -subdir=yes -enableinstrumentation=yes

On older version of client, before 8.1 use the -TESTFLAG=instrument:detail,-TESTFLAG=instrument:API, and -TESTFLAG=instrumentation:detail/API options.

dsminstr.log file is located in the directory that is specified by the DSM_LOG environment variable. You can also change the file name, location and size using instrlogname and instrlogmax options

Here is an example of statistics from dsminstr.log

Detailed Instrumentation statistics for

Thread: 5076  Elapsed time =   510.979 sec

Section                  Actual(sec)	Average(msec)	Frequency used
-----------------------------------------------------------------------------------
Compute                    0.218	      0.0	    27535
BeginTxn Verb              0.000	      0.0	       32
Transaction                0.374	     11.7	       32
File I/O                   2.668	      0.1	    20702
Compression               32.105	      1.2	    27520
Data Verb                445.225	     64.3	     6927
Confirm Verb               0.000	      0.0	        1
EndTxn Verb                0.000	      0.0	       32
TCP Read                  29.422	    198.8	      148
Thread Wait                0.905	    904.8	        1
Other                      0.062	      0.0	        0

-----------------------------------------------------------------------------------

Detailed Instrumentation statistics for

Thread: 5532  Elapsed time =   438.018 sec

Section                  Actual(sec)	Average(msec)	Frequency used
-----------------------------------------------------------------------------------
Process Dirs               0.140	      9.4	       15
Solve Tree                 0.000	      0.0	        1
Sleep                      0.062	     62.4	        1
TCP Read                   0.546	     39.0	       14
Thread Wait              437.206	    950.4	      460
Other                      0.062	      0.0	        0

-----------------------------------------------------------------------------------

Categories

CategoryActivity
Query Server DirsReceiving the server inventory directories for incremental backup
Query Server FilesReceiving the server inventory files for incremental backup
Process DirsScanning for files to back up
Cache ExamineScanning the local disk cache database for files to expire
Solve TreeDetermining directory structure
ComputeComputing throughput and compression ratio
BeginTxn VerbBuilding transactions Transaction File open, close, and other miscellaneous operations
File IOFile read and write
CompressionCompressing and uncompressing data
EncryptionEncrypting and decrypting data
CRCComputing and comparing CRC values
Data VerbSending and receiving data to and from the server (points to the network or IBM Spectrum Protect server)
Confirm VerbResponse time during backup for server confirm verb
EndTxn VerbServer transaction commit and tape synchronization (points to the IBM Spectrum Protect server)
OtherEverything else that is not tracked already
tsm/tsm_perf.txt · Last modified: 2023/11/06 10:14 by manu