User Tools

Site Tools


tsm:replace_volume

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
tsm:replace_volume [2021/11/25 17:54]
manu [Recovering from a lost or damaged FILE volume in a TSM deduplicated storage pool]
tsm:replace_volume [2021/11/25 17:59] (current)
manu
Line 6: Line 6:
  
   * First determine which volume has troubles   * First determine which volume has troubles
-<code+<cli prompt='>'​
- TSM> select volume_name,​write_errors,​read_errors,​access,​STGPOOL_NAME from volumes where error_state<>'​No'​ and ( WRITE_ERRORS>​0 or READ_ERRORS>​0 ) +Protect> select volume_name,​write_errors,​read_errors,​access,​STGPOOL_NAME from volumes where error_state<>'​No'​ and ( WRITE_ERRORS>​0 or READ_ERRORS>​0 ) 
-</code>+</cli>
  
   * Put the volmue access in destroy status   * Put the volmue access in destroy status
-<code+<cli prompt='>'​
- TSM> update vol <​volume_name>​ access=destroy +Protect> update vol <​volume_name>​ access=destroy 
-</code>+</cli>
  
   * Checkout the volume from lib inventory   * Checkout the volume from lib inventory
-<code+<cli prompt='>'​
- TSM> checkout libvol <​library_name>​ <​volume_name>​  +Protect> checkout libvol <​library_name>​ <​volume_name>​  
-</code+</cli
  
   * Search for K7 needed to restore the defect volume   * Search for K7 needed to restore the defect volume
-<code+<cli prompt='>'​
- TSM> restore vol <​volume_name>​ preview=yes +Protect> restore vol <​volume_name>​ preview=yes 
-</code>+</cli>
  
   * Check into activity log if all required K7 are avalaible into the library, and that enough scratch are available   * Check into activity log if all required K7 are avalaible into the library, and that enough scratch are available
-<code+<cli prompt='>'​
- TSM> restore vol <​volume_name>​  +Protect> restore vol <​volume_name>​  
-</code>+</cli>
  
   * Delete the old defect volume   * Delete the old defect volume
-<code+<cli prompt='>'​
- TSM> delete vol <​volume_name>​ discard=yes +Protect> delete vol <​volume_name>​ discard=yes 
-</code>+</cli>
  
 ==== In a primary stgpool using replicated data ==== ==== In a primary stgpool using replicated data ====
Line 50: Line 50:
     For example, to run the full node replication process and recover damaged files for client nodes in the PAYROLL group, issue the following command:     For example, to run the full node replication process and recover damaged files for client nodes in the PAYROLL group, issue the following command:
  
-    ​replicate node payroll recoverdamaged=yes+<cli prompt='>'>​ 
 +Protect> ​replicate node payroll recoverdamaged=yes 
 +</​cli>​
     copy to clipboard     copy to clipboard
  
     To run the node replication process only to recover damaged files, specify the name of the node or node group, and the RECOVERDAMAGED parameter with a value of ONLY.     To run the node replication process only to recover damaged files, specify the name of the node or node group, and the RECOVERDAMAGED parameter with a value of ONLY.
     For example, to recover damaged files for client nodes in the PAYROLL group without running the full node replication process, issue the following command:     For example, to recover damaged files for client nodes in the PAYROLL group without running the full node replication process, issue the following command:
- +<cli prompt='>'>​ 
-    replicate node payroll recoverdamaged=only +Protect> ​replicate node payroll recoverdamaged=only 
-    +</​cli>​ 
 +   
 ==== In a secondary stgpool ==== ==== In a secondary stgpool ====
  
   * First delete the volume, and checkout from library   * First delete the volume, and checkout from library
-<code+<cli prompt='>'​
- TSM> delete vol <​volume_name>​ discard=yes +Protect> delete vol <​volume_name>​ discard=yes 
- TSM> checkout libvol <​library_name>​ <​volume_name>​ remove=bulk +Protect> checkout libvol <​library_name>​ <​volume_name>​ remove=bulk 
-</code>+</cli>
  
   * Restore the volume from primary stgpool   * Restore the volume from primary stgpool
-<code+<cli prompt='>'​
- TSM> backup stgpool <​primary_stgpool>​ <​secondary_stgpool>​ +Protect> backup stgpool <​primary_stgpool>​ <​secondary_stgpool>​ 
-</code>+</cli>
  
 ===== Recovering from a lost or damaged FILE volume in a TSM deduplicated storage pool ===== ===== Recovering from a lost or damaged FILE volume in a TSM deduplicated storage pool =====
Line 92: Line 95:
     IMPORTANT: You must identify and update all known missing volumes at this step.     IMPORTANT: You must identify and update all known missing volumes at this step.
  
-    ​UPDATE VOLUME <volume name> ACCESS=DESTROYED +<cli prompt='>'>​ 
 +Protect> ​UPDATE VOLUME <volume name> ACCESS=DESTROYED 
 +</​cli>​
  
 2. For any file volume that still exists and is mountable (READONLY), but may contain damaged objects, update the volume(s) to read-only and audit them (for example, the file volume still exists but some objects on that volume cannot be accessed during read operations):​ 2. For any file volume that still exists and is mountable (READONLY), but may contain damaged objects, update the volume(s) to read-only and audit them (for example, the file volume still exists but some objects on that volume cannot be accessed during read operations):​
Line 100: Line 104:
     IMPORTANT: You must identify, update and audit all known damaged volumes at this step.     IMPORTANT: You must identify, update and audit all known damaged volumes at this step.
  
-    ​UPDATE VOLUME <volume name> ACCESS=READONLY +<cli prompt='>'>​ 
-    AUDIT VOLUME <volume name> +Protect> ​UPDATE VOLUME <volume name> ACCESS=READONLY 
 +Protect> ​AUDIT VOLUME <volume name> 
 +</​cli>​
  
 3. Manually initiate client backups, or wait for all normally scheduled clients to run a complete backup cycle, in an attempt to recover any damaged data that may still exist on the client filesystems. 3. Manually initiate client backups, or wait for all normally scheduled clients to run a complete backup cycle, in an attempt to recover any damaged data that may still exist on the client filesystems.
Line 109: Line 114:
  
  
-    ​RESTORE STGPOOL <stgpool name> PREVIEW=NO MAXPROCESS=<​n>​ +<cli prompt='>'>​ 
 +Protect> ​RESTORE STGPOOL <stgpool name> PREVIEW=NO MAXPROCESS=<​n>​ 
 +</​cli>​
     This process also initiates a silent background re-linker process that attempts to locate a valid copy of the damaged chunk somewhere else in the pool to relink the data. This has the opportunity to reduce the overall scope of the damage, which is why it is recommended regardless of whether a copy storage pool exists or not.     This process also initiates a silent background re-linker process that attempts to locate a valid copy of the damaged chunk somewhere else in the pool to relink the data. This has the opportunity to reduce the overall scope of the damage, which is why it is recommended regardless of whether a copy storage pool exists or not.
  
  
 5. If the data is (or might be) replicated, attempt to use node replication to recover the affected data on the missing or damaged volume(s) by issuing the following command and waiting for the process to end (monitor the process on the source and target servers): 5. If the data is (or might be) replicated, attempt to use node replication to recover the affected data on the missing or damaged volume(s) by issuing the following command and waiting for the process to end (monitor the process on the source and target servers):
- +<cli prompt='>'>​ 
- +Protect> ​REPLICATE NODE * RECOVERDAMAGED=ONLY WAIT=YES 
-    ​REPLICATE NODE * RECOVERDAMAGED=ONLY WAIT=YES +</​cli>​
  
 6. For any file volume(s) identified in step 2 above (READONLY), attempt to move any existing valid data from those volumes to other new volumes in the same storage pool (and wait for the process to end). Do not move the data to a different storage pool, and do not issue this command for missing volume(s) identified in step 1: 6. For any file volume(s) identified in step 2 above (READONLY), attempt to move any existing valid data from those volumes to other new volumes in the same storage pool (and wait for the process to end). Do not move the data to a different storage pool, and do not issue this command for missing volume(s) identified in step 1:
- +<cli prompt='>'>​ 
- +Protect> ​MOVE DATA <volume name> 
-    ​MOVE DATA <volume name> +</​cli>​
  
 7. Issue the following commands to determine if there are any objects or referenced deduplicated base chunks remaining on any of the volumes identified in steps 1 (DESTROYED) or 2 (READONLY) above: 7. Issue the following commands to determine if there are any objects or referenced deduplicated base chunks remaining on any of the volumes identified in steps 1 (DESTROYED) or 2 (READONLY) above:
- +<cli prompt='>'>​ 
- +Protect> ​QUERY CONTENT <volume name> FOLLOWLINKS=NO 
-    ​QUERY CONTENT <volume name> FOLLOWLINKS=NO +</​cli>​
     If this command lists objects, you have experienced irrecoverable backup data loss. The list of files returned is a list of irrecoverable objects and their owners (node names). If there are no objects listed or the volume can no longer be found, every object on this volume was recovered successfully using either a copy storage pool or node replication.     If this command lists objects, you have experienced irrecoverable backup data loss. The list of files returned is a list of irrecoverable objects and their owners (node names). If there are no objects listed or the volume can no longer be found, every object on this volume was recovered successfully using either a copy storage pool or node replication.
- +<cli prompt='>'>​ 
-    QUERY CONTENT <volume name> FOLLOWLINKS=JUSTLINKS +Protect> ​QUERY CONTENT <volume name> FOLLOWLINKS=JUSTLINKS 
 +</​cli>​
     If this command lists objects, then objects stored on other volumes need to be recovered due to damaged deduplicated base chunks on this volume. The list of files returned is a list of affected objects on other volumes that will need to be recovered.     If this command lists objects, then objects stored on other volumes need to be recovered due to damaged deduplicated base chunks on this volume. The list of files returned is a list of affected objects on other volumes that will need to be recovered.
  
Line 145: Line 148:
  
 8. For any file volume(s) identified in step 2 (READONLY) above, ensure that all unreadable data remains marked as damaged by initiating an audit (and wait for the process to end): 8. For any file volume(s) identified in step 2 (READONLY) above, ensure that all unreadable data remains marked as damaged by initiating an audit (and wait for the process to end):
- +<cli prompt='>'>​ 
- +Protect> ​AUDIT VOLUME <volume name> FIX=YES 
-    ​AUDIT VOLUME <volume name> FIX=YES +</​cli>​
  
 9. For any file volume(s) identified in step 2 (READONLY) above, attempt to move any remaining valid data from those volumes to other volumes in the same storage pool and wait for the process to end (do not move the data to a different storage pool): 9. For any file volume(s) identified in step 2 (READONLY) above, attempt to move any remaining valid data from those volumes to other volumes in the same storage pool and wait for the process to end (do not move the data to a different storage pool):
- +<cli prompt='>'>​ 
- +Protect> ​MOVE DATA <volume name> 
-    ​MOVE DATA <volume name> +</​cli>​
     IMPORTANT: Ensure that the MOVE DATA processes end with success. If they end with failure, review the activity log to determine why they failed. If the processes ended with failure because of a resource contention issue (ie lock conflict), re-issue the command at a later time. Otherwise, stop and contact IBM support for further review of the failure.     IMPORTANT: Ensure that the MOVE DATA processes end with success. If they end with failure, review the activity log to determine why they failed. If the processes ended with failure because of a resource contention issue (ie lock conflict), re-issue the command at a later time. Otherwise, stop and contact IBM support for further review of the failure.
  
Line 166: Line 167:
  
     NOTE: Deleting the volume will remove any of the remaining and irrecoverable objects on that volume. Record the object names and owners (node name) before deleting the volume and attempt to back them up from the owning node again later if possible. This information was previously collected with the first QUERY CONTENT command in step 7 above.     NOTE: Deleting the volume will remove any of the remaining and irrecoverable objects on that volume. Record the object names and owners (node name) before deleting the volume and attempt to back them up from the owning node again later if possible. This information was previously collected with the first QUERY CONTENT command in step 7 above.
- +<cli prompt='>'>​ 
-    DELETE VOLUME <volume name> DISCARDD=YES +Protect> ​DELETE VOLUME <volume name> DISCARDD=YES 
 +</​cli>​
     This command will create "​invalid links" for the objects referencing the data on this volume. If this command returns that the volume no longer exists (ANR2401E), then recovery completed at an earlier step, but you should still continue with the remaining steps in this document.     This command will create "​invalid links" for the objects referencing the data on this volume. If this command returns that the volume no longer exists (ANR2401E), then recovery completed at an earlier step, but you should still continue with the remaining steps in this document.
  
Line 177: Line 178:
  
 11. Scan and validate the deduplicated storage pool to determine if deleting the volume invalidated any links to base data: 11. Scan and validate the deduplicated storage pool to determine if deleting the volume invalidated any links to base data:
- +<cli prompt='>'>​ 
- +Protect> ​VALIDATE EXTENTS <​deduplicated stgpool> ACTION=MARKDAMAGED PREVIEW=NO 
-    ​VALIDATE EXTENTS <​deduplicated stgpool> ACTION=MARKDAMAGED PREVIEW=NO +</​cli>​
  
 12. Review the activity log to determine the results of the above step. The results will look similar to the following: 12. Review the activity log to determine the results of the above step. The results will look similar to the following:
Line 198: Line 198:
  
 13. If a copy storage pool exists, attempt to restore any affected data on the missing or damaged volume(s) by issuing the following command (and wait for the process to end): 13. If a copy storage pool exists, attempt to restore any affected data on the missing or damaged volume(s) by issuing the following command (and wait for the process to end):
- +<cli prompt='>'>​ 
- +Protect> ​RESTORE STGPOOL <stgpool name> PREVIEW=NO MAXPROCESS=<​n>​ 
-    ​RESTORE STGPOOL <stgpool name> PREVIEW=NO MAXPROCESS=<​n>​ +</​cli>​
  
 14. If the data is (or might be) replicated, attempt to use node replication to recover the affected data on the missing or damaged volume(s) by issuing the following command and waiting for the process to end (monitor the process on the source and target servers): 14. If the data is (or might be) replicated, attempt to use node replication to recover the affected data on the missing or damaged volume(s) by issuing the following command and waiting for the process to end (monitor the process on the source and target servers):
- +<cli prompt='>'>​ 
- +Protect> ​REPLICATE NODE * RECOVERDAMAGED=ONLY WAIT=YES 
-    ​REPLICATE NODE * RECOVERDAMAGED=ONLY WAIT=YES +</​cli>​
  
 15. Repeat steps 11 and 12 to verify that no further issues are reported after the RESTORE STGPOOL or REPLICATE NODE RECOVERDAMAGED recovery attempt. If no further objects are invalid/​damaged,​ then recovery is complete. If problems are still reported, then continue with the below step. 15. Repeat steps 11 and 12 to verify that no further issues are reported after the RESTORE STGPOOL or REPLICATE NODE RECOVERDAMAGED recovery attempt. If no further objects are invalid/​damaged,​ then recovery is complete. If problems are still reported, then continue with the below step.
tsm/replace_volume.1637859250.txt.gz · Last modified: 2021/11/25 17:54 by manu