====== Brocade problem determination ======
//For info// if ask for supportsave : CRA (Challenge Response Authentication) is chosen as "NO" or "N" and the issue is still seen and/or root access is not available
===== SAN switch cleanup files =====
FLEX-A1-BLUE:root> df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root 377M 247M 110M 69% /
/dev/hda3 193M 23M 161M 12% /core_files
/dev/hda1 377M 277M 80M 78% /mnt
FLEX-A1-BLUE:root>
FLEX-A1-BLUE:root> supportsave -R
Removing all core and FFDC files!
SupportSave completed (Duration : 0 minutes 1 seconds).
FLEX-A1-BLUE:root> cleanup
This utility will remove obsoleted files on the local CP.
Be aware the tool will remove all unauthorized code under
following directories on BOTH partitions:
/bin
/sbin
/lib
/fabos
/root
/usr
/core_files
Note that all the core files will be removed as well.
In case you want to save any of your private code or core,
please copy them before running this command.
Do you want to continue [Y]: Y
Checking //bin, please wait ...
Checking //sbin, please wait ...
Checking //usr, please wait ...
--Remove //usr/share/zoneinfo/Etc/GMT
--Remove //usr/share/zoneinfo/Etc/GMT+0
--Remove //usr/share/zoneinfo/Etc/UTC
--Remove //usr/share/zoneinfo/Europe/Luxembourg
--Remove //usr/share/zoneinfo/UTC
--Remove //usr/apache/bin/httpd.0
--Remove //usr/local/mib_indexes/0
--Remove //usr/local/snmpd.conf
Checking //fabos, please wait ...
--Remove //fabos/lib/libconfig_pharos.so.1.0
--Remove //fabos/man/cat7/AN-1001.7m.gz
--Remove //fabos/man/cat7/AN-1002.7m.gz
...
FLEX-A1-BLUE:root> supportsave -R
Removing all core and FFDC files!
SupportSave completed (Duration : 0 minutes 1 seconds).
FLEX-A1-BLUE:root> cleanup
This utility will remove obsoleted files on the local CP.
Be aware the tool will remove all unauthorized code under
following directories on BOTH partitions:
/bin
/sbin
/lib
/fabos
/root
/usr
/core_files
Note that all the core files will be removed as well.
In case you want to save any of your private code or core,
please copy them before running this command.
Do you want to continue [Y]: Y
Checking //bin, please wait ...
Checking //sbin, please wait ...
Checking //usr, please wait ...
--Remove //usr/share/zoneinfo/Etc/GMT
--Remove //usr/share/zoneinfo/Etc/GMT+0
--Remove //usr/share/zoneinfo/Etc/UTC
--Remove //usr/share/zoneinfo/Europe/Luxembourg
--Remove //usr/share/zoneinfo/UTC
--Remove //usr/apache/bin/httpd.0
--Remove //usr/local/mib_indexes/0
--Remove //usr/local/snmpd.conf
Checking //fabos, please wait ...
--Remove //fabos/lib/libconfig_pharos.so.1.0
--Remove //fabos/man/cat7/AN-1001.7m.gz
--Remove //fabos/man/cat7/AN-1002.7m.gz
...
--Remove /mnt/fabos/users/admin/.ssh/authorized_keys
--Remove /mnt/fabos/users/admin/.ssh/authorized_keys.admin
--Remove /mnt/fabos/users/admin/.ssh/authorizedKeys.tar
--Remove /mnt/fabos/webtools/bin/web.conf.0
--Remove /mnt/fabos/webtools/bin/httpd.conf.0
--Remove /mnt/fabos/webtools/htdocs/serverstatus.html
--Remove /mnt/fabos/webtools/htdocs/0.weblinker.fcg
Checking /mnt/lib, please wait ...
--Remove /mnt/lib/modules/default/modules.ieee1394map
--Remove /mnt/lib/modules/default/modules.pcimap
--Remove /mnt/lib/modules/default/modules.usbmap
--Remove /mnt/lib/modules/default/modules.ccwmap
--Remove /mnt/lib/modules/default/modules.isapnpmap
--Remove /mnt/lib/modules/default/modules.inputmap
--Remove /mnt/lib/modules/default/modules.ofmap
--Remove /mnt/lib/modules/default/modules.seriomap
--Remove /mnt/lib/modules/default/modules.alias
--Remove /mnt/lib/modules/default/modules.symbols
Checking /mnt/root, please wait ...
--Remove /mnt/root/.ssh/id_rsa
--Remove /mnt/root/.ssh/id_rsa.pub
Checking /mnt/core_files, please wait ...
Checking /mnt/etc/fabos/rbac, please wait ...
--Remove /mnt/etc/fabos/rbac/dynamic.tmp
Finish cleanup of /mnt
FLEX-A1-BLUE:root> df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root 377M 235M 123M 66% /
/dev/hda3 193M 16M 167M 9% /core_files
/dev/hda1 377M 264M 93M 74% /mnt
===== SSH not working, unable to firmwaredownload =====
===== SSH not working, unable to firmwaredownload =====
sansw01:admin> firmwaredownload -p sftp 10.10.10.10,root,/export/software/firmwares/san/v8.2.3c1_pha,Mypasswd
Server IP: 10.10.10.10, Protocol IPv4
Checking system settings for firmwaredownload...
Failed to access sftp://root:************@10.10.10.10//export/software/firmwares/san/v8.2.3c1_pha/release.plist
The server is inaccessible or firmware path is invalid. Please make sure the server name/IP address and the firmware path are valid, the protocol and authentication are supported. It is also possible that the RSA host key could have been changed and please contact the System Administrator for adding the correct host key.
sansw01:admin> seccryptocfg --default -type SSH -force
Terminating all SSH/SCP sessions running
Then retry the download
Additionnaly you can disable IPsec
ipsecConfig --disable
===== Defect Gbic module =====
porterrshow
sfpshow --> check RX and TX value at end (uW values)
* **RX** value is low (compare with others Gbics with same speed), then Gbic is defect on **host side**, or cable (rare time from Gbic on SAN switch)
* **Tx** low value problem comes from **Brocade switch Gbic** (or potentially from Gbic slot), or cable
**Example** here with lower RX Power, compare to the same port with no problems, so the problem is related to host side, or cable
sansw01:FID128:admin> sfpshow -all
Or
sansw01:FID128:admin> sfpshow 12/34
Identifier: 3 SFP
Connector: 7 LC
Transceiver: 7004404000000000 4,8,16_Gbps M5 sw Short_dist
RX Power: -7.1 dBm (193.7uW) 31.6 uW 1258.9 uW 31.6 uW 794.3 uW
TX Power: -3.1 dBm (486.5 uW) 125.9 uW 1258.9 uW 251.2 uW 794.3 uW
sansw01:FID128:admin> sfpshow 12/33
Identifier: 3 SFP
Connector: 7 LC
Transceiver: 7004404000000000 4,8,16_Gbps M5 sw Short_dist
RX Power: -3.4 dBm (453.3uW) 31.6 uW 1258.9 uW 31.6 uW 794.3 uW
TX Power: -3.1 dBm (488.8 uW) 125.9 uW 1258.9 uW 251.2 uW 794.3 uW
reset error statistics:\\
**statsclear** --> useful to see new errors
===== POD license not assigned or reserved yet =====
Use the following commands to change licences port assignation
licenseport –release
licenseport –reserve
licenseport --show
===== How to find the source of CRCs in a Brocade SAN =====
http://www-01.ibm.com/support/docview.wss?uid=ssg1S1009263
Use the command **porterrshow** (or portstatsshow)
If error on a port in section **CRC err**, and **CRC g_eof** is null, then this port is connected to a switch which produce errors, check on this switch to find the problem.
If error on a port in section **CRC err**, and **CRC g_eof** error counters are both incrementing, so the root source is with the attached device’s transmitter or the path from the sending device.
=== Frames tx/rx N/A counters representing the number of frames transmitted: ===
* **Enc_in:** 8bit/10bit encoding errors inside frame. Words inside of frames are encoded, if this encoding is corrupted or an error is detected, enc_in is generated. Minimum compliance with the link bit error rate specification on a link continuously receiving frames would cause approximately one error every 20 minutes. Reinitialisation/reboots of the associated Nx-port can also cause these errors. Everything hitting the wire is encoded using 8/10b encoding. The Bit Error Rate (BER) formula is BER= Nerr/Nbits. The BER is calculated by comparing the transmitted sequence of bits to the received bits and counting the number of errors. The ratio of how many bits received in error over the number of total bits received is the BER. This measured ratio is affected by many factors including: signal to noise, distortion, and jitter.
* **Crc_err:** crc errors - A mathematical formula generates counters at sending port. Receiving port uses the same formula to check and compare. Statistically, crc_err and enc_out errors together imply a GBIC/SFP problem. Also see bad_eof below. CRC and ENC_IN are pointing to a SFP and/or ASIC issue. ENC_out may be seen on loops connecting to a fabric (FMC for example) if a disk is changed or the loop initializes for any other reason. This loop initialization may not be noticeable from ONTAP. Therefore, it is important to know to what a connection is being made and what is to be expected of this connection. Generally speaking CRC_errs indicate an issue with the SFP.
* **Too_long:** FC frames are 2148 bytes maximum (frames that were longer than the FC maximum - SOF+header+2112bytes+CRC+EOF). If an eof is corrupted or data generation is incorrect, a too_long error is reported.
* **Too_short:** The too_short error statistics counter is incremented whenever a frame, bounded by an SOF and EOF is received, and the number of words between the SOF and EOF is less than 7 words (6 words header plus 1 word CRC), i.e. 38 bytes (not 48) including the SOF and EOF. This could be caused by the transmitter or an unreliable link.
* **Bad_eof:** After a loss of synchronization error, continuous-mode alignment allows the receiver to re-establish word alignment at any point in the incoming bit stream while the receiver is operational. If such a re-alignment occurs, detection of the resulting error condition is dependent upon higher level functions (such as invalid CRC or missing EOF).
* **Enc_out:** 8bit/10bit encoding errors occurred in words (ordered sets) outside of the FC frame. Words outside of frames are encoded. If this encoding is corrupted or an error is detected, enc_out is generated. It indicates a problem if it increments faster than the link-bit error rate allows, approximately once every 20 minutes for 1 Gbit/s. Statistically, enc_out errors on their own imply a cable/connector problem. Enc_out errors and crc_err together imply a GBIC/SFP problem. Such errors are also expected every time a user brings a port down and up (reboot host, power-cycle storage subsystem, unplug/plug cable or portdisable/portenable, and so on). Such errors will also be generated on a link which has a 1Gbit/s port connected to a 2Gbit/s port when autonegotiation is turned off. Crc and enc_in are more likely to be an SFP and/or ASIC issue. Enc_out is more likely sfp/cable. Also, if connecting to a disk loop (FMC), it's more likely to see them rising, which may not necessarily indicate an issue. To spot a possible issue, investigate other counters that are not part of portErrShow.
* **Disc c3:** Discard class 3 errors could be generated by the switch when devices send frames without FLOGIing first or with an invalid destination. This error is just reporting that a discard occurred. A frame can be discarded for a number of reasons; Timeout, destination unreachable, zone discard, or other reasons for discard. Most of the time you will see timeout, which means a frame is longer than E_D_TOV in the buffer. Disc/c3 is not trivial to troubleshoot as it is not always the port discarding the frame that is causing the issue.
* **Link-fail:** If a port remains in the LR Receive State for a period of time greater than a timeout period (R_A_TOV), a link reset protocol timeout will be detected, which results in a link failure condition (enter the NOS transmit state). The link failure also indicates that loss of signal or loss of sync lasting longer than the R_ATOV value was detected while not in the offline state.
* **Loss sync:** Synchronization failures on either bit or transmission-word boundaries are not separately identifiable and cause loss-of-synchronization errors. Such errors are also expected every time a user brings a port down and up ( reboot host, power-cycle storage subsystem, unplug/plug cable or portdisable/portenable, and so on).
* **Loss sig:** Occurs when a signal is transmitted but none is being received on the same port. Such errors are also expected every time a user brings a port down and up (reboot host, power-cycle storage subsystem, unplug/plug cable or portdisable/portenable, and so on).
* **Frjt:** If the fabric cannot process a class 2 frame, an F_RJT is returned
* **Frbsy:** If a fabric cannot deliver a class 2 frame within E_D_TOV a F_BSY will be returned.
* **c3-timeout tx:** The number of transmit class 3 frames discarded at the transmission port due to timeout (platform- and port-specific). This indicates an issue with the device connected to the switch.
* **c3-timeout rx:** The number of receive class 3 frames received at this port and discarded at the transmission port due to timeout (platform- and port-specific). This indicates an issue with the port on the switch.
//Note:// These errors should always be seen in relation to each other and in relation to the device that is being connected. There is a difference between a Loop with 28 disks being connected and a HBA in fabric mode. Additionally, CRCs by themselves with no other errors likely have a different cause than CRCs that are accompanied by enc_out errors.
===== Buffer credit problem =====
Check the parameters **tim64_txcrd_z** (Time BB_credit zero) and **stat64_inputBuffersFull** (Occasions on which input buffers are full).
besw32:admin> portstats64show 3/5
...
tim64_rdy_pri 7 226 622
tim64_txcrd_z 14 338 091 729
stat64_rateTxFrame 69 017
...
stat64_inputBuffersFull 20
Check the port buffer usage, on each port you can check if you have enough buffer credit if the parameter **stat64_inputBuffersFull** is equal to zero, and/or **tim64_txcrd_z** else you have to increase the buffercredit on this port, and if it's an ISL (E-port), add also buffercredit on the paired switch.
SWSAN1:admin> portbuffershow
User Port Lx Max/Resv Buffer Needed Link Remaining
Port Type Mode Buffers Usage Buffers Distance Buffers
--------------------------------------------------------------
0 E - - 16 24 10km
1 - - 0 - -
2 - - 0 - -
3 F - - 16 - - 76
--------------------------------------------------------------
Change the buffercredit value for the port, it's **diruptive** (connection needs to be redundant to keep your host online.
besw32:admin> portcfgfportbuffers --enable 3/5 24
Do not forget to clear the stats after changing the buffer credit value
besw32:admin> portstatsclear 3/5
Or **statsclear**
===== Brocade SAN unable to connect using ssh with public keys =====
After importkeys (using sshutil importpubkey), the SAN switch ask for a password!
On the SAN switch first connect using **root** account, and list the **authorized_keys** file rights.
[root@labaix] /root> ssh root@labsan01
root@labsan-blue's password:
Disclaimer for Root and Factory Accounts Usage!
LABSAN01:FID128:root> ps -ef | grep ssh
root 18474 1 0 12:36 ? 00:00:00 /usr/sbin/sshd
root 18475 18474 2 12:36 ? 00:00:00 sshd: root@pts/0
root 18567 18482 0 12:37 pts/0 00:00:00 grep ssh
LABSAN01:FID128:root> cd /fabos/users/admin/.ssh
LABSAN01:FID128:root> ls -l
total 28
-rw-r--r-- 1 root admin 10240 Apr 12 20:05 authorizedKeys.tar
-rw-r--r-- 1 root admin 398 Apr 12 20:05 authorized_keys
-rw------- 1 root admin 398 Apr 12 20:05 authorized_keys.admin
-rw------- 1 root admin 796 Apr 12 19:38 authorized_keys.lpardeploy
-rw-r--r-- 1 root admin 134 Jul 14 2016 environment
Now try a ssh connection in debug mode using a user defined on the SAN and with ssh public keys from your lab server for example admin
[root@labaix] /root> ssh -vv admin@labsan01
OpenSSH_6.0p1, OpenSSL 1.0.1e 11 Feb 2013
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Failed dlopen: /usr/krb5/lib/libkrb5.a(libkrb5.a.so): 0509-022 Cannot load module /usr/krb5/lib/libkrb5.a(libkrb5.a.so).
0509-026 System error: A file or directory in the path name does not exist.
debug1: Error loading Kerberos, disabling Kerberos auth.
...
debug2: key: /root/.ssh/id_rsa (20080bf8)
debug2: key: /root/.ssh/id_dsa (0)
debug2: key: /root/.ssh/id_ecdsa (0)
debug1: Authentications that can continue: publickey,password
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /root/.ssh/id_rsa
debug2: we sent a publickey packet, wait for reply
debug1: Authentications that can continue: publickey,password
debug1: Trying private key: /root/.ssh/id_dsa
debug1: Trying private key: /root/.ssh/id_ecdsa
debug2: we did not send a packet, disable method
debug1: Next authentication method: password
admin@labsan01's password:
A password is required
So now connect again to the san switch as root and change the righits to known_hosts file for my admin user:
[root@labaix] /root> ssh root@labsan01
root@labsan-blue's password:
Disclaimer for Root and Factory Accounts Usage!
LABSAN01:FID128:root> cd /fabos/users/admin/.ssh
LABSAN01:FID128:root> chmod 644 authorized_keys.admin
LABSAN01:FID128:root> ls -l
total 28
-rw-r--r-- 1 root admin 10240 Apr 12 20:05 authorizedKeys.tar
-rw------- 1 root admin 398 Apr 12 20:05 authorized_keys
-rw-r--r-- 1 root admin 398 Apr 12 20:05 authorized_keys.admin
-rw------- 1 root admin 796 Apr 12 19:38 authorized_keys.lpardeploy
-rw-r--r-- 1 root admin 134 Jul 14 2016 environment
And now retry a connection as admin:
[root@labaix] /root> ssh -vv admin@labsan01
...
debug2: key: /home/admin/.ssh/id_rsa (2004f6a8)
debug2: key: /home/admin/.ssh/id_dsa (0)
debug2: key: /home/admin/.ssh/id_ecdsa (0)
debug1: Authentications that can continue: publickey,password
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /home/admin/.ssh/id_rsa
debug2: we sent a publickey packet, wait for reply
debug1: Server accepts key: pkalg ssh-rsa blen 279
debug2: input_userauth_pk_ok: fp 4a:c6:ac:83:9f:26:b7:9e:0e:b2:21:b6:23:c1:94:cd
debug1: read PEM private key done: type RSA
debug1: Authentication succeeded (publickey).
...
LABSAN01:FID128:admin>
===== Brocade SAN other ssh problems =====
First step:
To configure a user for public key authentication:
switch:admin> sshutil allowuser username
Allowed user has been successfully changed to username.
If not enough,
Second step:
If you have other problems, you can do an **scp** of the file /etc/sshd_config (on brocade), to be able to modify it on a UNIX machine, and then do an scp back to the SANS swicth, then kill the sshd process as **root**, it will start again (do not use kill -9). You 'll be disconnected, reconnect again.
[root@labaix] /root> ssh root@labsan01
root@labsan-blue's password:
Disclaimer for Root and Factory Accounts Usage!
LABSAN01:FID128:root> ps -ef | grep ssh
root 18474 1 0 12:36 ? 00:00:00 /usr/sbin/sshd
root 18475 18474 2 12:36 ? 00:00:00 sshd: root@pts/0
root 18567 18482 0 12:37 pts/0 00:00:00 grep ssh
LABSAN01:FID128:root> kill 18474
As root check the files permissions
CURB04:FID128:root> cd /fabos/users/admin/
CURB04:FID128:root> ls -l
total 16
-rw-r--r-- 1 admin admin 507 May 24 2018 .bash_logout
-rw-r--r-- 1 admin admin 27 May 24 2018 .inputrc
-rw-r--r-- 1 admin admin 1347 May 24 2018 .profile
drwxr-xr-x 2 admin admin 4096 Nov 7 2018 .ssh/
CURB04:FID128:root> cd .ssh/
CURB04:FID128:root> ls -l
total 24
-rw-r--r-- 1 root root 10240 Nov 7 2018 authorizedKeys.tar
-rw------- 1 root root 790 Nov 7 2018 authorized_keys
-rw------- 1 admin admin 790 Feb 9 2018 authorized_keys.admin
-rw-r--r-- 1 admin admin 134 May 24 2018 environment
check authorized key file for admin