User Tools

Site Tools


storage:brocade_pb

Brocade problem determination

For info if ask for supportsave : CRA (Challenge Response Authentication) is chosen as “NO” or “N” and the issue is still seen and/or root access is not available

SAN switch cleanup files

FLEX-A1-BLUE:root> df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/root             377M  247M  110M  69% /
/dev/hda3             193M   23M  161M  12% /core_files
/dev/hda1             377M  277M   80M  78% /mnt
FLEX-A1-BLUE:root>
FLEX-A1-BLUE:root> supportsave -R
Removing all core and FFDC files!
SupportSave completed (Duration : 0 minutes 1 seconds).
FLEX-A1-BLUE:root> cleanup

This utility will remove obsoleted files on the local CP.

Be aware the tool will remove all unauthorized code under
following directories on BOTH partitions:

        /bin
        /sbin
        /lib
        /fabos
        /root
        /usr
        /core_files

Note that all the core files will be removed as well.
In case you want to save any of your private code or core,
please copy them before running this command.

Do you want to continue [Y]: Y
Checking //bin, please wait ...
Checking //sbin, please wait ...
Checking //usr, please wait ...
--Remove //usr/share/zoneinfo/Etc/GMT
--Remove //usr/share/zoneinfo/Etc/GMT+0
--Remove //usr/share/zoneinfo/Etc/UTC
--Remove //usr/share/zoneinfo/Europe/Luxembourg
--Remove //usr/share/zoneinfo/UTC
--Remove //usr/apache/bin/httpd.0
--Remove //usr/local/mib_indexes/0
--Remove //usr/local/snmpd.conf
Checking //fabos, please wait ...
--Remove //fabos/lib/libconfig_pharos.so.1.0
--Remove //fabos/man/cat7/AN-1001.7m.gz
--Remove //fabos/man/cat7/AN-1002.7m.gz
...
FLEX-A1-BLUE:root> supportsave -R
Removing all core and FFDC files!
SupportSave completed (Duration : 0 minutes 1 seconds).
FLEX-A1-BLUE:root> cleanup

This utility will remove obsoleted files on the local CP.

Be aware the tool will remove all unauthorized code under
following directories on BOTH partitions:

        /bin
        /sbin
        /lib
        /fabos
        /root
        /usr
        /core_files

Note that all the core files will be removed as well.
In case you want to save any of your private code or core,
please copy them before running this command.

Do you want to continue [Y]: Y
Checking //bin, please wait ...
Checking //sbin, please wait ...
Checking //usr, please wait ...
--Remove //usr/share/zoneinfo/Etc/GMT
--Remove //usr/share/zoneinfo/Etc/GMT+0
--Remove //usr/share/zoneinfo/Etc/UTC
--Remove //usr/share/zoneinfo/Europe/Luxembourg
--Remove //usr/share/zoneinfo/UTC
--Remove //usr/apache/bin/httpd.0
--Remove //usr/local/mib_indexes/0
--Remove //usr/local/snmpd.conf
Checking //fabos, please wait ...
--Remove //fabos/lib/libconfig_pharos.so.1.0
--Remove //fabos/man/cat7/AN-1001.7m.gz
--Remove //fabos/man/cat7/AN-1002.7m.gz
...
--Remove /mnt/fabos/users/admin/.ssh/authorized_keys
--Remove /mnt/fabos/users/admin/.ssh/authorized_keys.admin
--Remove /mnt/fabos/users/admin/.ssh/authorizedKeys.tar
--Remove /mnt/fabos/webtools/bin/web.conf.0
--Remove /mnt/fabos/webtools/bin/httpd.conf.0
--Remove /mnt/fabos/webtools/htdocs/serverstatus.html
--Remove /mnt/fabos/webtools/htdocs/0.weblinker.fcg
Checking /mnt/lib, please wait ...
--Remove /mnt/lib/modules/default/modules.ieee1394map
--Remove /mnt/lib/modules/default/modules.pcimap
--Remove /mnt/lib/modules/default/modules.usbmap
--Remove /mnt/lib/modules/default/modules.ccwmap
--Remove /mnt/lib/modules/default/modules.isapnpmap
--Remove /mnt/lib/modules/default/modules.inputmap
--Remove /mnt/lib/modules/default/modules.ofmap
--Remove /mnt/lib/modules/default/modules.seriomap
--Remove /mnt/lib/modules/default/modules.alias
--Remove /mnt/lib/modules/default/modules.symbols
Checking /mnt/root, please wait ...
--Remove /mnt/root/.ssh/id_rsa
--Remove /mnt/root/.ssh/id_rsa.pub
Checking /mnt/core_files, please wait ...
Checking /mnt/etc/fabos/rbac, please wait ...
--Remove /mnt/etc/fabos/rbac/dynamic.tmp
Finish cleanup of /mnt

FLEX-A1-BLUE:root> df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/root             377M  235M  123M  66% /
/dev/hda3             193M   16M  167M   9% /core_files
/dev/hda1             377M  264M   93M  74% /mnt

SSH not working, unable to firmwaredownload

SSH not working, unable to firmwaredownload

sansw01:admin> firmwaredownload -p sftp 10.10.10.10,root,/export/software/firmwares/san/v8.2.3c1_pha,Mypasswd

Server IP: 10.10.10.10, Protocol IPv4
Checking system settings for firmwaredownload...
Failed to access sftp://root:************@10.10.10.10//export/software/firmwares/san/v8.2.3c1_pha/release.plist
The server is inaccessible or firmware path is invalid. Please make sure the server name/IP address and the firmware path are valid, the protocol and authentication are supported. It is also possible that the RSA host key could have been changed and please contact the System Administrator for adding the correct host key.
sansw01:admin> seccryptocfg --default -type SSH -force
Terminating all SSH/SCP sessions running

Then retry the download

Additionnaly you can disable IPsec

ipsecConfig --disable

Defect Gbic module

porterrshow

sfpshow <port_no> –> check RX and TX value at end (uW values)

  • RX value is low (compare with others Gbics with same speed), then Gbic is defect on host side, or cable (rare time from Gbic on SAN switch)
  • Tx low value problem comes from Brocade switch Gbic (or potentially from Gbic slot), or cable

Example here with lower RX Power, compare to the same port with no problems, so the problem is related to host side, or cable

sansw01:FID128:admin> sfpshow -all

Or

sansw01:FID128:admin> sfpshow 12/34
Identifier:  3    SFP
Connector:   7    LC
Transceiver: 7004404000000000 4,8,16_Gbps M5 sw Short_dist
RX Power:    -7.1    dBm (193.7uW)   31.6   uW  1258.9 uW  31.6   uW   794.3  uW
TX Power:    -3.1    dBm (486.5 uW)  125.9  uW  1258.9 uW  251.2  uW   794.3  uW


sansw01:FID128:admin> sfpshow 12/33
Identifier:  3    SFP
Connector:   7    LC
Transceiver: 7004404000000000 4,8,16_Gbps M5 sw Short_dist
RX Power:    -3.4    dBm (453.3uW)   31.6   uW  1258.9 uW  31.6   uW   794.3  uW
TX Power:    -3.1    dBm (488.8 uW)  125.9  uW  1258.9 uW  251.2  uW   794.3  uW

reset error statistics:
statsclear –> useful to see new errors

POD license not assigned or reserved yet

Use the following commands to change licences port assignation

licenseport –release <portnumber>
licenseport –reserve <portnumber> 
licenseport --show

How to find the source of CRCs in a Brocade SAN

http://www-01.ibm.com/support/docview.wss?uid=ssg1S1009263

Use the command porterrshow (or portstatsshow)

If error on a port in section CRC err, and CRC g_eof is null, then this port is connected to a switch which produce errors, check on this switch to find the problem.

If error on a port in section CRC err, and CRC g_eof error counters are both incrementing, so the root source is with the attached device’s transmitter or the path from the sending device.

Frames tx/rx N/A counters representing the number of frames transmitted:

  • Enc_in: 8bit/10bit encoding errors inside frame. Words inside of frames are encoded, if this encoding is corrupted or an error is detected, enc_in is generated. Minimum compliance with the link bit error rate specification on a link continuously receiving frames would cause approximately one error every 20 minutes. Reinitialisation/reboots of the associated Nx-port can also cause these errors. Everything hitting the wire is encoded using 8/10b encoding. The Bit Error Rate (BER) formula is BER= Nerr/Nbits. The BER is calculated by comparing the transmitted sequence of bits to the received bits and counting the number of errors. The ratio of how many bits received in error over the number of total bits received is the BER. This measured ratio is affected by many factors including: signal to noise, distortion, and jitter.
  • Crc_err: crc errors - A mathematical formula generates counters at sending port. Receiving port uses the same formula to check and compare. Statistically, crc_err and enc_out errors together imply a GBIC/SFP problem. Also see bad_eof below. CRC and ENC_IN are pointing to a SFP and/or ASIC issue. ENC_out may be seen on loops connecting to a fabric (FMC for example) if a disk is changed or the loop initializes for any other reason. This loop initialization may not be noticeable from ONTAP. Therefore, it is important to know to what a connection is being made and what is to be expected of this connection. Generally speaking CRC_errs indicate an issue with the SFP.
  • Too_long: FC frames are 2148 bytes maximum (frames that were longer than the FC maximum - SOF+header+2112bytes+CRC+EOF). If an eof is corrupted or data generation is incorrect, a too_long error is reported.
  • Too_short: The too_short error statistics counter is incremented whenever a frame, bounded by an SOF and EOF is received, and the number of words between the SOF and EOF is less than 7 words (6 words header plus 1 word CRC), i.e. 38 bytes (not 48) including the SOF and EOF. This could be caused by the transmitter or an unreliable link.
  • Bad_eof: After a loss of synchronization error, continuous-mode alignment allows the receiver to re-establish word alignment at any point in the incoming bit stream while the receiver is operational. If such a re-alignment occurs, detection of the resulting error condition is dependent upon higher level functions (such as invalid CRC or missing EOF).
  • Enc_out: 8bit/10bit encoding errors occurred in words (ordered sets) outside of the FC frame. Words outside of frames are encoded. If this encoding is corrupted or an error is detected, enc_out is generated. It indicates a problem if it increments faster than the link-bit error rate allows, approximately once every 20 minutes for 1 Gbit/s. Statistically, enc_out errors on their own imply a cable/connector problem. Enc_out errors and crc_err together imply a GBIC/SFP problem. Such errors are also expected every time a user brings a port down and up (reboot host, power-cycle storage subsystem, unplug/plug cable or portdisable/portenable, and so on). Such errors will also be generated on a link which has a 1Gbit/s port connected to a 2Gbit/s port when autonegotiation is turned off. Crc and enc_in are more likely to be an SFP and/or ASIC issue. Enc_out is more likely sfp/cable. Also, if connecting to a disk loop (FMC), it's more likely to see them rising, which may not necessarily indicate an issue. To spot a possible issue, investigate other counters that are not part of portErrShow.
  • Disc c3: Discard class 3 errors could be generated by the switch when devices send frames without FLOGIing first or with an invalid destination. This error is just reporting that a discard occurred. A frame can be discarded for a number of reasons; Timeout, destination unreachable, zone discard, or other reasons for discard. Most of the time you will see timeout, which means a frame is longer than E_D_TOV in the buffer. Disc/c3 is not trivial to troubleshoot as it is not always the port discarding the frame that is causing the issue.
  • Link-fail: If a port remains in the LR Receive State for a period of time greater than a timeout period (R_A_TOV), a link reset protocol timeout will be detected, which results in a link failure condition (enter the NOS transmit state). The link failure also indicates that loss of signal or loss of sync lasting longer than the R_ATOV value was detected while not in the offline state.
  • Loss sync: Synchronization failures on either bit or transmission-word boundaries are not separately identifiable and cause loss-of-synchronization errors. Such errors are also expected every time a user brings a port down and up ( reboot host, power-cycle storage subsystem, unplug/plug cable or portdisable/portenable, and so on).
  • Loss sig: Occurs when a signal is transmitted but none is being received on the same port. Such errors are also expected every time a user brings a port down and up (reboot host, power-cycle storage subsystem, unplug/plug cable or portdisable/portenable, and so on).
  • Frjt: If the fabric cannot process a class 2 frame, an F_RJT is returned
  • Frbsy: If a fabric cannot deliver a class 2 frame within E_D_TOV a F_BSY will be returned.
  • c3-timeout tx: The number of transmit class 3 frames discarded at the transmission port due to timeout (platform- and port-specific). This indicates an issue with the device connected to the switch.
  • c3-timeout rx: The number of receive class 3 frames received at this port and discarded at the transmission port due to timeout (platform- and port-specific). This indicates an issue with the port on the switch.

Note: These errors should always be seen in relation to each other and in relation to the device that is being connected. There is a difference between a Loop with 28 disks being connected and a HBA in fabric mode. Additionally, CRCs by themselves with no other errors likely have a different cause than CRCs that are accompanied by enc_out errors.

Buffer credit problem

Check the parameters tim64_txcrd_z (Time BB_credit zero) and stat64_inputBuffersFull (Occasions on which input buffers are full).

besw32:admin> portstats64show 3/5
...
tim64_rdy_pri	7 226 622
tim64_txcrd_z	14 338 091 729
stat64_rateTxFrame	69 017
...
stat64_inputBuffersFull  20

Check the port buffer usage, on each port you can check if you have enough buffer credit if the parameter stat64_inputBuffersFull is equal to zero, and/or tim64_txcrd_z else you have to increase the buffercredit on this port, and if it's an ISL (E-port), add also buffercredit on the paired switch.

SWSAN1:admin> portbuffershow
User  Port   Lx   Max/Resv Buffer Needed     Link   Remaining
Port  Type  Mode  Buffers  Usage  Buffers  Distance  Buffers
--------------------------------------------------------------
  0     E     -      -       16       24       10km
  1           -      -        0       -        -
  2           -      -        0       -        -
  3     F     -      -       16       -        -          76
--------------------------------------------------------------

Change the buffercredit value for the port, it's diruptive (connection needs to be redundant to keep your host online.

besw32:admin> portcfgfportbuffers --enable 3/5 24

Do not forget to clear the stats after changing the buffer credit value

besw32:admin> portstatsclear 3/5

Or statsclear

Brocade SAN unable to connect using ssh with public keys

After importkeys (using sshutil importpubkey), the SAN switch ask for a password!

On the SAN switch first connect using root account, and list the authorized_keys file rights.

[root@labaix] /root> ssh root@labsan01
root@labsan-blue's password:
Disclaimer for Root and Factory Accounts Usage!
LABSAN01:FID128:root>  ps -ef | grep ssh
root     18474     1  0 12:36 ?        00:00:00 /usr/sbin/sshd
root     18475 18474  2 12:36 ?        00:00:00 sshd: root@pts/0
root     18567 18482  0 12:37 pts/0    00:00:00 grep ssh

LABSAN01:FID128:root> cd /fabos/users/admin/.ssh

LABSAN01:FID128:root> ls -l
total 28
-rw-r--r--   1 root     admin       10240 Apr 12 20:05 authorizedKeys.tar
-rw-r--r--   1 root     admin         398 Apr 12 20:05 authorized_keys
-rw-------   1 root     admin         398 Apr 12 20:05 authorized_keys.admin
-rw-------   1 root     admin         796 Apr 12 19:38 authorized_keys.lpardeploy
-rw-r--r--   1 root     admin         134 Jul 14  2016 environment

Now try a ssh connection in debug mode using a user defined on the SAN and with ssh public keys from your lab server for example admin

[root@labaix] /root> ssh -vv admin@labsan01
OpenSSH_6.0p1, OpenSSL 1.0.1e 11 Feb 2013
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Failed dlopen: /usr/krb5/lib/libkrb5.a(libkrb5.a.so):   0509-022 Cannot load module /usr/krb5/lib/libkrb5.a(libkrb5.a.so).
        0509-026 System error: A file or directory in the path name does not exist.

debug1: Error loading Kerberos, disabling Kerberos auth.
...
debug2: key: /root/.ssh/id_rsa (20080bf8)
debug2: key: /root/.ssh/id_dsa (0)
debug2: key: /root/.ssh/id_ecdsa (0)
debug1: Authentications that can continue: publickey,password
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /root/.ssh/id_rsa
debug2: we sent a publickey packet, wait for reply
debug1: Authentications that can continue: publickey,password
debug1: Trying private key: /root/.ssh/id_dsa
debug1: Trying private key: /root/.ssh/id_ecdsa
debug2: we did not send a packet, disable method
debug1: Next authentication method: password
admin@labsan01's password:

A password is required

So now connect again to the san switch as root and change the righits to known_hosts file for my admin user:

[root@labaix] /root> ssh root@labsan01
root@labsan-blue's password:
Disclaimer for Root and Factory Accounts Usage!

LABSAN01:FID128:root> cd /fabos/users/admin/.ssh
LABSAN01:FID128:root>  chmod 644 authorized_keys.admin

LABSAN01:FID128:root> ls -l
total 28
-rw-r--r--   1 root     admin       10240 Apr 12 20:05 authorizedKeys.tar
-rw-------   1 root     admin         398 Apr 12 20:05 authorized_keys
-rw-r--r--   1 root     admin         398 Apr 12 20:05 authorized_keys.admin
-rw-------   1 root     admin         796 Apr 12 19:38 authorized_keys.lpardeploy
-rw-r--r--   1 root     admin         134 Jul 14  2016 environment

And now retry a connection as admin:

[root@labaix] /root> ssh -vv admin@labsan01
...
debug2: key: /home/admin/.ssh/id_rsa (2004f6a8)
debug2: key: /home/admin/.ssh/id_dsa (0)
debug2: key: /home/admin/.ssh/id_ecdsa (0)
debug1: Authentications that can continue: publickey,password
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /home/admin/.ssh/id_rsa
debug2: we sent a publickey packet, wait for reply
debug1: Server accepts key: pkalg ssh-rsa blen 279
debug2: input_userauth_pk_ok: fp 4a:c6:ac:83:9f:26:b7:9e:0e:b2:21:b6:23:c1:94:cd
debug1: read PEM private key done: type RSA
debug1: Authentication succeeded (publickey).
...
LABSAN01:FID128:admin>

Brocade SAN other ssh problems

First step: To configure a user for public key authentication:

switch:admin> sshutil allowuser username
Allowed user has been successfully changed to username.

If not enough,

Second step:

If you have other problems, you can do an scp of the file /etc/sshd_config (on brocade), to be able to modify it on a UNIX machine, and then do an scp back to the SANS swicth, then kill the sshd process as root, it will start again (do not use kill -9). You 'll be disconnected, reconnect again.

[root@labaix] /root> ssh root@labsan01
root@labsan-blue's password:
Disclaimer for Root and Factory Accounts Usage!
LABSAN01:FID128:root>  ps -ef | grep ssh
root     18474     1  0 12:36 ?        00:00:00 /usr/sbin/sshd
root     18475 18474  2 12:36 ?        00:00:00 sshd: root@pts/0
root     18567 18482  0 12:37 pts/0    00:00:00 grep ssh

LABSAN01:FID128:root> kill 18474

As root check the files permissions

CURB04:FID128:root> cd /fabos/users/admin/
CURB04:FID128:root> ls -l
total 16
-rw-r--r--   1 admin    admin         507 May 24  2018 .bash_logout
-rw-r--r--   1 admin    admin          27 May 24  2018 .inputrc
-rw-r--r--   1 admin    admin        1347 May 24  2018 .profile
drwxr-xr-x   2 admin    admin        4096 Nov  7  2018 .ssh/
CURB04:FID128:root> cd .ssh/
CURB04:FID128:root> ls -l
total 24
-rw-r--r--   1 root     root        10240 Nov  7  2018 authorizedKeys.tar
-rw-------   1 root     root          790 Nov  7  2018 authorized_keys
-rw-------   1 admin    admin         790 Feb  9  2018 authorized_keys.admin
-rw-r--r--   1 admin    admin         134 May 24  2018 environment

check authorized key file for admin

storage/brocade_pb.txt · Last modified: 2025/08/23 23:38 (external edit)