IBM 7133阵列管理

SSA卡电池管理
#ssa_fw_status -a ssa1 显示cache电池的信息
#ssa_format -1 ssa1 -b 清除cache电池的寿命为零
#ssa_format -l ssa0 清掉cache里的内容,
#ssadisk -a ssa1 -L 列出对应ssa卡的hdisk盘
#ssadisk -a ssa1 -P 列出对应ssa卡的pdisk盘
#ssa_diag -l ssa0 显示ssa的SRN号,通过SRN号来确定问题
#/usr/lpp/diagnostics/bin/ssa_diag -l ssa0 -a
//如果为返回值为0则正常,如果返回为1则表示机器电源或风扇没有冗余,如果为2则表示卡坏了。
# ssaadap -l hdisk3 确定hdisk3是从属哪个ssa卡
ssa_progress —to show how much(by percentage) of a format operation has been completed,
and to show the status of the format operation.the status can be “complete”,”formatting”,or “failed”
#ssa_progress -l pdisk0
#ssaraid -l ssa0 -Iz 列出hdisk 和pdisk的状态
配置确认,找到pdisk跟hdisk的对应关系
#diag—-Task Selection –SSA Service Aids–Configuration Verification
格式化盘,将导致盘上的数据丢失。当盘无法被所用时,这是一种解决办法。
#diag—-Task Selection –SSA Service Aids–Format Disk
验证盘是否有坏块等问题
#diag—-Task Selection –SSA Service Aids–Certify Disk
显示或更新磁盘的Microcode
#diag—-Task Selection –SSA Service Aids–Display/Download Disk Drive Microcode–Display the Microcode levels of all SSA Physical Disk Drives 显示Microcode
#diag—-Task Selection –SSA Service Aids–Display/Download Disk Drive Microcode–Download Microcode 可选一个盘或所有的盘。是从软盘中安装。
对盘进行raid操作,使用smitty ssaraid。
定位盘
#smitty dev —ssa disks—-SSA Physical Disks— Identify an SSA Physical Disk—选中盘。

ssaencl -l enclosure0 -b 显示enclosure0 的bypass cards 的状态

ssaencl -l enclosure0 -c -v 显示VPD的状态

ssaencl -l enclosure0 -I R202 更改enclosure0 的ID为 R202

ssaencl -l enclosure0 -d 8 显示 disk bay 8 的内容

是不是用这种方法查找已经坏的磁盘,执行的时候,那个磁盘的disk identification lights会闪
确认要更换的盘之后,需要不需要返回上一级菜单:

Cancel all SSA Disk Identifications

把确认有问题的磁盘插上,之后如果想把这个磁盘加到degraded的RAID中,需要对这个磁盘执行什么操作!

用户可以用以下命令察看7133 RAID的状态
smitty ssaraid
List Status of all Defined SSA RAID Arrays
当RAID中的硬盘出现问题时,此RAID的状态是“degraded”

  1. 可用以下命令判断硬盘是否被被阵列(Array)剔除(reject):
    smit ssaraid
    List/Identify SSA Physical Disks
    List Rejected Array Disks

如果硬盘没有被阵列(Array)剔除(reject)
smitty ssaraid
Change Member Disks in an SSA RAID Array
Remove a disk from an SSA RAID Array
选择相应的阵列(array)和想要更换的硬盘(pdisk#)。

  1. 物理上更换硬盘。
  2. 运行下列命令。
    rmdev -dl pdisk# –从系统中删除想更换硬盘的定义。
    cfgmgr -vl ssar —从新配置新加的硬盘。
  3. smitty ssaraid
    Change/Show use of an ssa physical disk
    将新加硬盘的状态变为 Array Candidate.
  4. smiity ssaraid
    Change Member Disks in an SSA RAID Array
    Add a disk to an SSA RAID Array
    将新硬盘加入阵列(Array)。

此时,用 smitty ssaraid
List Status of all Defined SSA RAID Arrays
检查此阵列应为“Rebuilding”状态,在Rebuilding完成后阵列
返回“Good”状态。

ssaencl -l name
[-s]
[-v]
[-i]
[-r]
[-b[card …]]
[-t[threshold …]]
[-a]
[-f[fan …]]
[-d[drive_bay …]]
[-p[PSU …]]
[-o]
[-c]
To modify enclosure component settings:
ssaencl -l name
[-I ID [-U]]
[-B mode | card=mode …]
[-S {d[drive_bay …] | b[card …] | p[PSU
…]|r|c|o}]
[-T threshold=value …]
For help, type: ssaencl -? or ssaencl -h

ssa maintain tips

Connect new SSA Drives
cfgmgr

Change new SSA drives from "AIX System Drives" to "Array Candidate Disks"
     smitty
     Devices
     SSA RAID Arrays
     Change Use of Multiple SSA Physical Disks
       ssa0
       select all the disks
     Make "New Use" =>  "Array Candidate Disks"

Then smitty
     Devices
     SSA RAID Arrays
     Add an SSA RAID Array
       ssa0
       raid 5
       select all the disks
etc ...
    
It will take an hour maybe to rebuild.  You can see status over on
     Devices
     SSA RAID Arrays
     List Status Of All Defined SSA RAID Arrays

Once built, you'll have a new hdisk.  lscfg will show something like
* hdisk4           P2-I1/Q1-W8D5D430242874CK  SSA Logical Disk Drive

Put this in a volume group
     smitty
     System Storage Management (Physical & Logical Storage)
     Logical Volume Manager
     Volume Groups
     Add a Volume Group
call it maybe ssavg
etc ...

Define a logical volume
     smitty
     System Storage Management (Physical & Logical Storage)
     Logical Volume Manager
     Logical Volumes
     Add a Logical Volume
call it maybe ssalv
Use the total number of PV's less one (do a lsvg ssavg to see the total)
etc ...

Define a File System
     smitty
     System Storage Management (Physical & Logical Storage)
     File Systems
     Add / Change / Show / Delete File Systems
     Journaled File Systems
!!!  Add a Journaled File System on a Previously Defined Logical Volume
!!!  Add a Large File Enabled Journaled File System
etc ...

Mount it
    cp -prh /ssa /ssa_new


==================================================================
Some SSA RAID tidbits ...

  ssaraid -l ssa0 -I             Gives info on pdisks & hdisks.
  ssaraid -l ssa0 -I -t raid_5   Gives info on hdisks only, including
                                 fastwrite.  So
  ssaraid -l ssa0 -I |grep fastwrite         shows you all settings.
  ssaraid -l ssa0 -I -t disk     Gives info on pdisks only.
  ssa_format -l ssa0 -b          To reset the battery replacement timer.


==================================================================
There are AIX SSA device drivers (of course), as well as
microcode on the SSA drives,
          on the SSA adapter,
      and on the SSA enclosure (for 7133-d40's only).

It used to be that you could reference the SSAFLASH PACKAGE
on VMTOOLS for latest levels and directions on how to update
each microcode, but now, you gotta go to Hursley's SSA web site at
  http://www.hursley.ibm.com/ssa
especially the http://www.hursley.ibm.com/ssa/rs6k/index.html link.

See also Steve Garrett at 6-7794.

==================================================================
  To replace a failed SSA drive, presuming
- the drive was a member of an array,
- and now it has failed, so its status is "rejected",
- you've already physically replaced the bad drive with a new one,
- and you want the new drive to have the same pdisk number,

1) Remove the pdisk definition for the pdisk you're replacing.
   - rmdev -dl pdisk8
2) Run cfgmgr to configure the new pdisk.  The new drive will get
   the old pdisk number and there will also be a new hdisk number,
   which gets removed in the next step.
3) Change use of the disk to "Array Candidate Disk"
   - smitty     (Fastpath = smitty chgssadisk)
   - Devices
   - SSA RAID Arrays
   - Change/Show Use of an SSA Physical Disk
     - Select the SSA adapter you're working with/on.
     - Select the pdisk.  It will be at the bottom.
     - Change "Current Use" to "Array Candidate Disk".
4) Add disk to "Degraded" array.
   Either hit PF3 twice after doing step 3) above, or
     - smitty     (Fastpath = smitty addssaraid)
     - Devices
     - SSA RAID Arrays
   Then
   - Add a Disk to an SSA RAID Array
     - The only choices should be the degraded array
     - and if you hit PF4, your only choice will be the pdisk
       you just configured.
5) diag -a to clean things up.
6) To clear out the error log,
   - errclear -N pdisk8 0 
   - errclear -N ssa0   0 

==================================================================
       To Replace a Failed SSA Drive on as0301e0/1

  Bruce has the pdisks defined as simply "AIX System Disks", so
there's a 1-to-1 relationship between pdisks & hdisks, and Bruce
has chosen to have 2 hdisks in each db*vg volume group, with
each LV mirrored with "EACH LP COPY ON A SEPARATE PV' set to "yes".

Here's what I did to replace pdisk13, which failed on 9/16/1999.
For background,

   pdisk13 = hdisk14 =\
                       db12vg = db12lv = /home/inst1/db_mount/db12fs
   pdisk8  = hdisk9  =/      \= loglv11

1) Break the mirror & remove the Physical Volume from the Volume Group.
   rmlvcopy db12lv  1 hdisk14
   rmlvcopy loglv11 1 hdisk14
   reducevg db12vg    hdisk14

2) Replace the drive, tell AIX to forget about them, and reconfigure.
   rmdev -dl hdisk14
   rmdev -dl pdisk13
   cfgmgr

3) Add the PV back into the VG, then redefine & resynch the 2 mirrors.
   extendvg db12vg hdisk14
   mklvcopy db12lv  2 hdisk14
   mklvcopy loglv11 2 hdisk14
   syncvg -v db12vg

==================================================================
Notes on when I upgraded all the SSA stuff on as0209 on 1/7/2000.

At the time, as0209 was running AIX 4.3.2 and there was a single
"IBM SSA Enhanced RAID Adapter (14104500)" adapter, connected to
the 7133-d40 chassis in the S70 frame (third chassis from the
bottom), which had 16 36GB drives.

To update the AIX software,

1) Download the latest fixes from the Hursley SSA site.  Best is
   to start at http://www.hursley.ibm.com/ssa/rs6k and work your
   way through, but on 1/7/2000, the final page was
   http://www.hursley.ibm.com/ssa/rs6k/AIX_Levels/aix_download.html

   What you get is a file called upgrade432.tar.
 
   On 1-18-2001 when I checked, I had devices.ssa.disk.rte 4.3.3.10
   for example, and the web page said the latest was 4.3.3.27 and the
   latest AIX 4.3.3 code named ssacode433.tar.
   This update also required bos.rte.lvm 4.3.3.25 and I only had
   4.3.3.18, so I had to ftp aix.boulder.ibm.com
                         login as anonymous
                         cd /aix/fixes/v4/os
                         bin
                         get bos.rte.lvm.4.3.3.26.bff
   rebuild the .toc, and then the upgrade worked.

2) Do the normal 
      tar -xvf
   which untars into the /usr/sys/inst.images directory, so
      cd /usr/sys/inst.images
      inutoc .
   (I also untar'd this into the CWS's $PROD/SSA_Upgrades directory,
    did the inutoc, touch'd .mklinks & putting it in the $lpp directory.)

3) Install the AIX fixes and get the microcode you need for the next
   steps into the proper directory (/etc/microcode, which is a link to
   /usr/lib/microcode), by
      smitty installp
   pointing it to /usr/sys/inst.images.  Insure you select
      Install and Update from ALL Available Software
   at the bottom of the screen, else you won't get the SSA microcode
   filesets (ssamcode.* and ssadiskmcode.*).

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
To update the RAID adapter microcode,

0) You can check the adapter microcode level by lscfg -vl ssa0.  Look at
   the "ROS Level and ID" line.  as0209 started out with 6301.
   On 1-18-2001, reindeer started with 7201.

1) Insure the microcode you want is in the /etc/microcode directory on
   the machine you want to update.  E.G. I needed
   /etc/microcode/microcode/14104500.04.72, which got updated/created when
   I installed the latest ssamcode.pcinetworkraid.obj fix in step 3 above.
    
2) Insure your devices aren't being used.  On as0209, this meant shutting
   down DB/2.

3) Run cfgmgr.  cfgmgr knows to update the adapter microcode if it sees
   newer code in /etc/microcode.  At this point, lscfg -vl ssa0 shows the
   microcode level to be 7201, but even so, the directions say to next
   On 1-18-2001, reindeer ended up with 7301.

4) Reboot the system.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
If this is one of those new 7133-T40 or -D40 enclosures, they too have
downloadable microcode.  To update the enclosure microcode,

0) You can check the enclosure microcode level by doing a
      lscfg -vp
   When I started, as0209's enclosure0 showed ROS Level and ID = 0011.

1) Insure you have the two "coral" filesets downloaded.  They should have
   been installed when you updated AIX above.  The two filesets are
       ssadiskmcode.coraldld.obj     SSA ENCLOSURE Download Tool
   and ssadiskmcode.coralmcode.obj   SSA ENCLOSURE microcode
   The "Download Tool" gives you the /etc/microcode/ssa_sesdld command.
   The microcode filesets gets you /etc/microcode/coral014.hex.

2) Again, insure your devices aren't being used.

3) To update the enclosure microcode,
      cd /etc/microcode
      /etc/microcode/ssa_sesdld -d enclosure0 -f coral014.hex

4) After the last step above, the enclosure microcode is really updated,
   but the lscfg -vp will still show 0011.  To fix this,
      mkdev -l enclosure0
   Now the lscfg -vp command shows the right thing, 0014 in my case.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
To update the microcode on all your drives, 

0) You can check the microcode level on all your drives by doing a
      lscfg -vp
   When I started, as0209's drives showed ROS Level and ID = 0004,
   again, this is for 36GB drives.

1) Again, insure your devices aren't being used.

2) ssadload -u   This takes about 30 seconds per drive.  Afterwords, the
   lscfg -vp command showed 0009.