IBM520小型机RAID卡电池更换

1、 问题发现及分析
数据库服务器告警黄灯亮,检查日志发现时raid卡电池告警导致
2、 问题分析诊断
输入命令:# sisraidmgr -M o0 -l’sisioa0’
或输入命令:# smitty pxdam 后选择操作
或按如下操作:#smit ->Devices -Disk Array ->IBM PCI-X SCSI Disk Array->PCI-X SCSI Disk Array Manager Select -> Diagnostics and Recovery Options -> Select Controller Rechargeable Battery Maintenance. -> Select Display Controller Rechargeable Battery Information.
显示电池信息如下:

电池信息各参数含义如下:
battery state:显示电池当前具体状态,有以下几个可能的值
No battery warning/error:No warning or error condition currently exist 这种是电池的正常状态
Warning condition:A warning condition currently exists and an error has been logged 出现了一个警告并被记录在日志中
Error condition:An error condition currently exists and an error has been logged 电池已经出错
Unknow:Information is not available to determine whether a warning or error condition currently exists 无法判断
Power-on time(days):电池持续使用的天数
Adjusted power-on time(time):
Indicates the adjusted (prorated) power-on time, in units of days ,of the rechargeable Cache Battery Pack
Note:Some rechargeable Cache Battery Packs are negatively affected by higher temperatures and thus are prorated based on the amount of time that they spend at various ambient temperatures
Estimated(估算的) time to warning(days):
Estimated time, in units of days, until a message is issued indicating that the replacement fo the rechargeable Cache Bettery Pack should be scheduled
Estimated time to error(days):当值为0时,此时Cache被自动禁用,必须更换电池
Estimated time , in units of days ,until an error is reported indicating that the rechargeable Cache Bettery Pack must be replaced
Concurrently maintainable battery pack:该电池是否可以在线更换
Indicates that the rechargeable Cache Battery Pack can by replaced while the adapter continues to operate
Battery pack can be safely replaced:当值为YES时,该电池可以安全地更换,不会造成Cache数据丢失
Indicates that the adapter’s write cache has been disabled and the rechargeable Cache Battery Pack can be safely replaced
根据信息显示:距离ERROR还有26天时间,电池安全取代更换状态为NO,需要强行控制电池错误才可以更换;如果电池包安全取代状态为YES,则可以直接更换。
3、 具体处理步骤如下:

操作前先检查P52A系统状态,查看是否有其他报错信息;
跟客户协调更换控制器电池时间并且在约定时间停止业务;
检查电池信息是否没电了;
请用户停止业务服务,不要关机;
检查电池状态是否可以安全更换;
输入命令:# sisraidmgr -M o0 -l’sisioa0’
或输入命令:# smitty pxdam 后选择操作
或按如下操作:#smit ->Devices -Disk Array -> IBM PCI-X SCSI Disk Array->PCI-X SCSI Disk Array Manager Select -> Diagnostics and Recovery Options -> Select Controller Rechargeable Battery Maintenance. -> Select Display Controller Rechargeable Battery Information.

当Battery pack can be safely replaced . . : YES显示安全状态为YES,则可以直接关机更换RAID卡电池;如果显示为NO,需要强行控制电池错误后关机更换RAID卡电池。
6) 强行控制电池错误执行如下命令:
Smitty—> devices->Disk Array->IBM PCI-X SCSI Disk Array-> PCI -X SCSI Disk Array Manager->DiagnosticsandRecovery Options->Controller Rechargeable Battery Maintenance->Force Controller Rechargeable Battery Error;
或输入命令:# smitty pxdam 后选择操作
为了安全操作,再次查询RAID卡电池状态;
7) 系统关机;
8) 更换控制器RAID卡电池;
注意:必须将旧电池拆下来至少15秒后,再安装新电池,否则PCI-X SCSI RAID卡将不能正常识别电池已更换。
9) 更换完成后检查无问题后开机;
10) 系统启动后复查报错信息是否已经解决

电池天数1087天,电池更换成功。
11) 重新激活写cache
输入命令# smitty pxdam

消除IBM小型机告警信息
操作步骤如下:# diag -> Task Selection (Diagnostics, Advanced Diagnostics, Service Aids, etc.) -> Log Repair Action -> sys0 System Object 回车确认此时,sys0前面出现了一个“+”符号,表示已选中该项
按F7(或ESC+7)提交确认
ESC+0退出
告警清除!
确认无问题后开启业务;
维修结束
可以参考: http://blog.51cto.com/eric1026/1883319
4、 风险及应急:
更换控制器电池需要停止业务,系统启动停止操作,在维修处理前一定要确认无其他报错信息不影响系统重新启动后方可操作
5、维护建议:
对系统应定时检查,出现故障时及时联系维护人员,尽快解决故障,保证业务正常运行。
当硬盘发生故障时,不要盲目更换配件,否则极易造成系统损坏,整个系统的瘫痪,因此要谨慎操作。
当硬盘或背板故障时,不要盲目更换,应逐一排查,防止数据丢失。
5、 容易出现的问题
1、 更换电池后开机启动时,系统找不到镜像
问题分析:关机拔下raid卡,开机启动系统可以找到镜像,说明raid卡接触不良

解决办法:给raid卡除尘

©著作权归作者所有:来自51CTO博客作者cuijb0221的原创作品,请联系作者获取转载授权,否则将追究法律责任
IBM520小型机RAID卡电池更换
https://blog.51cto.com/cuijb/2322573

AIX ODM Commands


Basic ODM Commands:

ODM is object database Manager

NOTE: VERY IMPORTANT!

Use these commands with EXTREME CAUTION!!! You should make backup copies of the individual ODM Class files (CuAt, CuDv, CuDvDr, CuDep,and CuVPD), before you attempt to use these commands.

First, take a backup of the ODM files by issuing:
cd /etc/objrepos
for i in CuAt CuDv CuDvDr CuDep
do
odmget $i > /tmp/$i.orig
done

1. How to find disk drive in ODM customized database:

odmget -q name=hdisk# CuAt           ==> CuAt = Customized Attribute
odmget -q value=hdisk# CuAt
odmget -q name=hdisk# CuDv           ==> CuDv = Customized Device
odmget -q value3=hdisk# CuDvDr      ==> CuDvDr = Customized Device Driver
odmget -q name=hdisk# CuDep        ==> CuDep = Customized Dependency
odmget -q name=hdisk# CuVPD       ==> CuVPD = Customized Vital Product Database

2. How to remove disk drive entries from ODM customized database:

odmdelete -q name=hdisk# -o CuAt
odmdelete -q value=hdisk# -o CuAt
odmdelete -q name=hdisk# -o CuDv
odmdelete -q value3=hdisk# -o CuDvDr
odmdelete -q name=hdisk# -o CuDep
odmdelete -q name=hdisk# -o CuVPD

3. How to find VG (rootvg) in ODM database:

odmget -q name=rootvg CuAt
odmget -q name=rootvg CuDv
odmget -q parent=rootvg CuDv
odmget -q value1=rootvg CuDvDr
odmget -q value3=rootvg CuDvDr
odmget -q name=rootvg CuDep

4. How to find LV in ODM database:

odmget -q name=LV CuAt
odmget -q name=LV CuDv
odmget -q value3=LV CuDvDr
odmget -q dependency=LV CuDep

5. How to find an object in CuDvDR by major, minor number:

Example: if major num=10 & minor=1
odmget -q “value=10 and value=1” CuDvDr

6. How to find value (may be pvid of an old disk left in CuAt):

odmget -q value=00001165d6faf66b0000000000000000 CuAt
(Add 16 zeros after the PVID number. This value
should be 32 characters in lenght.)

7. To search the ODM for a specific Item.

odmget CuAt | grep (Specific Item) -> record the number of items
odmget CuDv | grep (Specific Item) -> record the number of items
odmget CuDvDr | grep (Specific Item) -> record the number of items
odmget CuDep | grep (Specific Item) -> record the number of items
odmget CuVPD | grep (Specific Item) -> record the number of items
Now you can use the odmdelete command above to remove the specific item that you searched for.

解决 No utmpx entry. You must exec “login” from the lowest level “shell”

通常硬碟空間已經滿了…  再加上系統重新開機 , 系統是開機了 但是沒有足夠的空間產生  tmp  ..  沒有tmp程序自然無法執行  登入也是如此


解决:
进入单用户  / 使用光碟開機 進入單人維護模式

cd /var/adm
mv utmpx utmpxbak 或者 > utmpx
touch utmpx

cat /dev/null > /var/adm/wtmpx
cat /dev/null > /var/adm/utmpx

> /etc/utmp
> /etc/wtmp
> /etc/utmpx

IBM 7133阵列管理

SSA卡电池管理
#ssa_fw_status -a ssa1 显示cache电池的信息
#ssa_format -1 ssa1 -b 清除cache电池的寿命为零
#ssa_format -l ssa0 清掉cache里的内容,
#ssadisk -a ssa1 -L 列出对应ssa卡的hdisk盘
#ssadisk -a ssa1 -P 列出对应ssa卡的pdisk盘
#ssa_diag -l ssa0 显示ssa的SRN号,通过SRN号来确定问题
#/usr/lpp/diagnostics/bin/ssa_diag -l ssa0 -a
//如果为返回值为0则正常,如果返回为1则表示机器电源或风扇没有冗余,如果为2则表示卡坏了。
# ssaadap -l hdisk3 确定hdisk3是从属哪个ssa卡
ssa_progress —to show how much(by percentage) of a format operation has been completed,
and to show the status of the format operation.the status can be “complete”,”formatting”,or “failed”
#ssa_progress -l pdisk0
#ssaraid -l ssa0 -Iz 列出hdisk 和pdisk的状态
配置确认,找到pdisk跟hdisk的对应关系
#diag—-Task Selection –SSA Service Aids–Configuration Verification
格式化盘,将导致盘上的数据丢失。当盘无法被所用时,这是一种解决办法。
#diag—-Task Selection –SSA Service Aids–Format Disk
验证盘是否有坏块等问题
#diag—-Task Selection –SSA Service Aids–Certify Disk
显示或更新磁盘的Microcode
#diag—-Task Selection –SSA Service Aids–Display/Download Disk Drive Microcode–Display the Microcode levels of all SSA Physical Disk Drives 显示Microcode
#diag—-Task Selection –SSA Service Aids–Display/Download Disk Drive Microcode–Download Microcode 可选一个盘或所有的盘。是从软盘中安装。
对盘进行raid操作,使用smitty ssaraid。
定位盘
#smitty dev —ssa disks—-SSA Physical Disks— Identify an SSA Physical Disk—选中盘。

ssaencl -l enclosure0 -b 显示enclosure0 的bypass cards 的状态

ssaencl -l enclosure0 -c -v 显示VPD的状态

ssaencl -l enclosure0 -I R202 更改enclosure0 的ID为 R202

ssaencl -l enclosure0 -d 8 显示 disk bay 8 的内容

是不是用这种方法查找已经坏的磁盘,执行的时候,那个磁盘的disk identification lights会闪
确认要更换的盘之后,需要不需要返回上一级菜单:

Cancel all SSA Disk Identifications

把确认有问题的磁盘插上,之后如果想把这个磁盘加到degraded的RAID中,需要对这个磁盘执行什么操作!

用户可以用以下命令察看7133 RAID的状态
smitty ssaraid
List Status of all Defined SSA RAID Arrays
当RAID中的硬盘出现问题时,此RAID的状态是“degraded”

  1. 可用以下命令判断硬盘是否被被阵列(Array)剔除(reject):
    smit ssaraid
    List/Identify SSA Physical Disks
    List Rejected Array Disks

如果硬盘没有被阵列(Array)剔除(reject)
smitty ssaraid
Change Member Disks in an SSA RAID Array
Remove a disk from an SSA RAID Array
选择相应的阵列(array)和想要更换的硬盘(pdisk#)。

  1. 物理上更换硬盘。
  2. 运行下列命令。
    rmdev -dl pdisk# –从系统中删除想更换硬盘的定义。
    cfgmgr -vl ssar —从新配置新加的硬盘。
  3. smitty ssaraid
    Change/Show use of an ssa physical disk
    将新加硬盘的状态变为 Array Candidate.
  4. smiity ssaraid
    Change Member Disks in an SSA RAID Array
    Add a disk to an SSA RAID Array
    将新硬盘加入阵列(Array)。

此时,用 smitty ssaraid
List Status of all Defined SSA RAID Arrays
检查此阵列应为“Rebuilding”状态,在Rebuilding完成后阵列
返回“Good”状态。

ssaencl -l name
[-s]
[-v]
[-i]
[-r]
[-b[card …]]
[-t[threshold …]]
[-a]
[-f[fan …]]
[-d[drive_bay …]]
[-p[PSU …]]
[-o]
[-c]
To modify enclosure component settings:
ssaencl -l name
[-I ID [-U]]
[-B mode | card=mode …]
[-S {d[drive_bay …] | b[card …] | p[PSU
…]|r|c|o}]
[-T threshold=value …]
For help, type: ssaencl -? or ssaencl -h

ssa maintain tips

Connect new SSA Drives
cfgmgr

Change new SSA drives from "AIX System Drives" to "Array Candidate Disks"
     smitty
     Devices
     SSA RAID Arrays
     Change Use of Multiple SSA Physical Disks
       ssa0
       select all the disks
     Make "New Use" =>  "Array Candidate Disks"

Then smitty
     Devices
     SSA RAID Arrays
     Add an SSA RAID Array
       ssa0
       raid 5
       select all the disks
etc ...
    
It will take an hour maybe to rebuild.  You can see status over on
     Devices
     SSA RAID Arrays
     List Status Of All Defined SSA RAID Arrays

Once built, you'll have a new hdisk.  lscfg will show something like
* hdisk4           P2-I1/Q1-W8D5D430242874CK  SSA Logical Disk Drive

Put this in a volume group
     smitty
     System Storage Management (Physical & Logical Storage)
     Logical Volume Manager
     Volume Groups
     Add a Volume Group
call it maybe ssavg
etc ...

Define a logical volume
     smitty
     System Storage Management (Physical & Logical Storage)
     Logical Volume Manager
     Logical Volumes
     Add a Logical Volume
call it maybe ssalv
Use the total number of PV's less one (do a lsvg ssavg to see the total)
etc ...

Define a File System
     smitty
     System Storage Management (Physical & Logical Storage)
     File Systems
     Add / Change / Show / Delete File Systems
     Journaled File Systems
!!!  Add a Journaled File System on a Previously Defined Logical Volume
!!!  Add a Large File Enabled Journaled File System
etc ...

Mount it
    cp -prh /ssa /ssa_new


==================================================================
Some SSA RAID tidbits ...

  ssaraid -l ssa0 -I             Gives info on pdisks & hdisks.
  ssaraid -l ssa0 -I -t raid_5   Gives info on hdisks only, including
                                 fastwrite.  So
  ssaraid -l ssa0 -I |grep fastwrite         shows you all settings.
  ssaraid -l ssa0 -I -t disk     Gives info on pdisks only.
  ssa_format -l ssa0 -b          To reset the battery replacement timer.


==================================================================
There are AIX SSA device drivers (of course), as well as
microcode on the SSA drives,
          on the SSA adapter,
      and on the SSA enclosure (for 7133-d40's only).

It used to be that you could reference the SSAFLASH PACKAGE
on VMTOOLS for latest levels and directions on how to update
each microcode, but now, you gotta go to Hursley's SSA web site at
  http://www.hursley.ibm.com/ssa
especially the http://www.hursley.ibm.com/ssa/rs6k/index.html link.

See also Steve Garrett at 6-7794.

==================================================================
  To replace a failed SSA drive, presuming
- the drive was a member of an array,
- and now it has failed, so its status is "rejected",
- you've already physically replaced the bad drive with a new one,
- and you want the new drive to have the same pdisk number,

1) Remove the pdisk definition for the pdisk you're replacing.
   - rmdev -dl pdisk8
2) Run cfgmgr to configure the new pdisk.  The new drive will get
   the old pdisk number and there will also be a new hdisk number,
   which gets removed in the next step.
3) Change use of the disk to "Array Candidate Disk"
   - smitty     (Fastpath = smitty chgssadisk)
   - Devices
   - SSA RAID Arrays
   - Change/Show Use of an SSA Physical Disk
     - Select the SSA adapter you're working with/on.
     - Select the pdisk.  It will be at the bottom.
     - Change "Current Use" to "Array Candidate Disk".
4) Add disk to "Degraded" array.
   Either hit PF3 twice after doing step 3) above, or
     - smitty     (Fastpath = smitty addssaraid)
     - Devices
     - SSA RAID Arrays
   Then
   - Add a Disk to an SSA RAID Array
     - The only choices should be the degraded array
     - and if you hit PF4, your only choice will be the pdisk
       you just configured.
5) diag -a to clean things up.
6) To clear out the error log,
   - errclear -N pdisk8 0 
   - errclear -N ssa0   0 

==================================================================
       To Replace a Failed SSA Drive on as0301e0/1

  Bruce has the pdisks defined as simply "AIX System Disks", so
there's a 1-to-1 relationship between pdisks & hdisks, and Bruce
has chosen to have 2 hdisks in each db*vg volume group, with
each LV mirrored with "EACH LP COPY ON A SEPARATE PV' set to "yes".

Here's what I did to replace pdisk13, which failed on 9/16/1999.
For background,

   pdisk13 = hdisk14 =\
                       db12vg = db12lv = /home/inst1/db_mount/db12fs
   pdisk8  = hdisk9  =/      \= loglv11

1) Break the mirror & remove the Physical Volume from the Volume Group.
   rmlvcopy db12lv  1 hdisk14
   rmlvcopy loglv11 1 hdisk14
   reducevg db12vg    hdisk14

2) Replace the drive, tell AIX to forget about them, and reconfigure.
   rmdev -dl hdisk14
   rmdev -dl pdisk13
   cfgmgr

3) Add the PV back into the VG, then redefine & resynch the 2 mirrors.
   extendvg db12vg hdisk14
   mklvcopy db12lv  2 hdisk14
   mklvcopy loglv11 2 hdisk14
   syncvg -v db12vg

==================================================================
Notes on when I upgraded all the SSA stuff on as0209 on 1/7/2000.

At the time, as0209 was running AIX 4.3.2 and there was a single
"IBM SSA Enhanced RAID Adapter (14104500)" adapter, connected to
the 7133-d40 chassis in the S70 frame (third chassis from the
bottom), which had 16 36GB drives.

To update the AIX software,

1) Download the latest fixes from the Hursley SSA site.  Best is
   to start at http://www.hursley.ibm.com/ssa/rs6k and work your
   way through, but on 1/7/2000, the final page was
   http://www.hursley.ibm.com/ssa/rs6k/AIX_Levels/aix_download.html

   What you get is a file called upgrade432.tar.
 
   On 1-18-2001 when I checked, I had devices.ssa.disk.rte 4.3.3.10
   for example, and the web page said the latest was 4.3.3.27 and the
   latest AIX 4.3.3 code named ssacode433.tar.
   This update also required bos.rte.lvm 4.3.3.25 and I only had
   4.3.3.18, so I had to ftp aix.boulder.ibm.com
                         login as anonymous
                         cd /aix/fixes/v4/os
                         bin
                         get bos.rte.lvm.4.3.3.26.bff
   rebuild the .toc, and then the upgrade worked.

2) Do the normal 
      tar -xvf
   which untars into the /usr/sys/inst.images directory, so
      cd /usr/sys/inst.images
      inutoc .
   (I also untar'd this into the CWS's $PROD/SSA_Upgrades directory,
    did the inutoc, touch'd .mklinks & putting it in the $lpp directory.)

3) Install the AIX fixes and get the microcode you need for the next
   steps into the proper directory (/etc/microcode, which is a link to
   /usr/lib/microcode), by
      smitty installp
   pointing it to /usr/sys/inst.images.  Insure you select
      Install and Update from ALL Available Software
   at the bottom of the screen, else you won't get the SSA microcode
   filesets (ssamcode.* and ssadiskmcode.*).

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
To update the RAID adapter microcode,

0) You can check the adapter microcode level by lscfg -vl ssa0.  Look at
   the "ROS Level and ID" line.  as0209 started out with 6301.
   On 1-18-2001, reindeer started with 7201.

1) Insure the microcode you want is in the /etc/microcode directory on
   the machine you want to update.  E.G. I needed
   /etc/microcode/microcode/14104500.04.72, which got updated/created when
   I installed the latest ssamcode.pcinetworkraid.obj fix in step 3 above.
    
2) Insure your devices aren't being used.  On as0209, this meant shutting
   down DB/2.

3) Run cfgmgr.  cfgmgr knows to update the adapter microcode if it sees
   newer code in /etc/microcode.  At this point, lscfg -vl ssa0 shows the
   microcode level to be 7201, but even so, the directions say to next
   On 1-18-2001, reindeer ended up with 7301.

4) Reboot the system.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
If this is one of those new 7133-T40 or -D40 enclosures, they too have
downloadable microcode.  To update the enclosure microcode,

0) You can check the enclosure microcode level by doing a
      lscfg -vp
   When I started, as0209's enclosure0 showed ROS Level and ID = 0011.

1) Insure you have the two "coral" filesets downloaded.  They should have
   been installed when you updated AIX above.  The two filesets are
       ssadiskmcode.coraldld.obj     SSA ENCLOSURE Download Tool
   and ssadiskmcode.coralmcode.obj   SSA ENCLOSURE microcode
   The "Download Tool" gives you the /etc/microcode/ssa_sesdld command.
   The microcode filesets gets you /etc/microcode/coral014.hex.

2) Again, insure your devices aren't being used.

3) To update the enclosure microcode,
      cd /etc/microcode
      /etc/microcode/ssa_sesdld -d enclosure0 -f coral014.hex

4) After the last step above, the enclosure microcode is really updated,
   but the lscfg -vp will still show 0011.  To fix this,
      mkdev -l enclosure0
   Now the lscfg -vp command shows the right thing, 0014 in my case.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
To update the microcode on all your drives, 

0) You can check the microcode level on all your drives by doing a
      lscfg -vp
   When I started, as0209's drives showed ROS Level and ID = 0004,
   again, this is for 36GB drives.

1) Again, insure your devices aren't being used.

2) ssadload -u   This takes about 30 seconds per drive.  Afterwords, the
   lscfg -vp command showed 0009.

7025-f50 disk add or replace

Adding a New Drive to a Live System
When you install a new drive in the hot swap subsystem, the amber light in
the carrier will flash and then go out. This indicates that the drive has been
identified and is not spinning. At this point, AIX does not know that the drive
is present in the system. To tell AIX that you have added a new drive, you
can use SMIT or type:
cfgmgr
from the command line. Running this command will cause AIX to find the
drive and spin it up. You will see the amber light come on and hear the
sound of the drive spinning up. Using the lspv command, you will be able
to see your new drive added. If the drive contains a volume group, you
access it from your system by running:
importvg -y VGname hdiskx
Otherwise, you can add the disk to an existing volume group by using smit
extendvg or create a new volume group by using smit mkvg.
Removing a Drive from a Live System
To be able to remove a hot swap drive from the system without causing
problems, you will have to tell AIX that you are removing the drive.
Removing a Disk from an Existing Volume Group: If you want to physically
remove the disk and it belongs to an existing volume group, you would
either remove the logical volumes which are present on the disk (you can
determine which logical volumes are present on a disk by using the lspv -l
hdiskx command), or migrate the physical partitions from the disk to other
disks in the same volume group. To remove a disk from an existing volume
group, you can use the following procedure:
1. Unmount all the file systems on the disk if you are removing the logical
volumes on the disk.
2. Remove all data from the drive by either removing the logical volumes
or by migrating the partitions on the disk to another disk in the same
volume group ( smit migratepv). If you are removing the logical volumes,
you may wish to back up the data prior to removal.
3. Remove the drive from the volume group:
reducevg VGname hdiskx
4. Remove the device from the ODM
rmdev -d -l hdiskx
When you run the rmdev command, the amber light on the drive will
switch off. If you run lspv, you will see that the disk is no longer defined
on the system.
5. Physically remove the drive from the system
Removing a Drive with Its Own Volume Group: To remove a drive which
has its own volume group, you can use the following procedure:
1. Back up any data that you require from the volume group.
2. Unmount all the file systems on the disk.
3. Varyoff the volume group by issuing:
varyoffvg VGname
4. Export the volume group by issuing:
exportvg VGname
5. Remove the device from the ODM by issuing:
rmdev -d -l hdiskx
6. Physically remove the device from the system.
rmdev -d -l hdiskx
When you run the rmdev command, the amber light on the drive will
switch off. If you run lspv, you will see that the disk is no longer defined
on the system.
5. Physically remove the drive from the system.Removing a Drive with Its Own Volume Group: To remove a drive which
has its own volume group, you can use the following procedure:
1. Back up any data that you require from the volume group.
2. Unmount all the file systems on the disk.
3. Varyoff the volume group by issuing:
varyoffvg VGname
4. Export the volume group by issuing:
exportvg VGname
5. Remove the device from the ODM by issuing:
rmdev -d -l hdiskx
6. Physically remove the device from the system.
Replacing a Previously Defined Drive
If you add a disk drive which was already configured to the system and was
removed using the procedures described above, then you can simply add
the new drive as described in “Adding a New Drive to a Live System” on
page 141.Replacing a Previously Defined Drive into the Same Bay: If a drive was
physically removed without first being logically removed from the operating
system, then AIX may have problems. If there were no writes to the disk
after the removal of the disk, then further action is not required. If a write
occurred after or during the removal of the drive and the drive has been
re-added, then you should perform the following:
1. Unmount all file systems on the disk.
2. fsck -y file systems on the disk.
3. Remount the file systems on the disk.Replacing a Previously Defined Drive into a Different Bay: If you place a
drive into a bay different than the one from which it was removed (the one
configured to the system by running the cfgmgr command) and you did not
remove the device from the ODM before physically removing the device, you

will have to clear up the ODM because there will be a duplicate entry for the
drive.
The following shows a scenario where there were three disks in the system
and two volume groups on two separate disks. The disk belonging to testvg
was physically removed without telling AIX.
Running the lspv command shows the three disks:
lspv
hdisk0 00000000a641877c rootvg
hdisk1 00000000b0a645b0 testvg
hdisk2 00000000a6274604 None
hdisk1 was removed from the system before telling AIX. The disk was
replaced into a different bay than it was removed from. The cfgmgr
command was run to configure the disk back into the system, and now lspv
shows:
lspv
hdisk0 00000000a641877c rootvg
hdisk1 00000000b0a645b0 testvg
hdisk2 00000000a6274604 None
hdisk3 00000000b0a645b0 testvg
There are now two entries for the testvg volume group, both with the same
physical volume identifier. This is incorrect and can be cleared up by
performing the following:
1. Unmount all file systems in the testvg volume group.
2. varyoffvg testvg
3. exportvg testvg
At this point, you may get the following error which you can ignore:
0516-024 /usr/sbin/lqueryvg: Unable to open physical volume.
Either PV was not configured or could not be opened. Run
diagnostics.
Running lspv now shows:
lspv
hdisk0 00000000a641877c rootvg
hdisk1 00000000b0a645b0 None
hdisk2 00000000a6274604 None
hdisk3 00000000b0a645b0 None
4. rmdev -d -l hdisk1
5. rmdev -d -l hdisk3
6. cfgmgr
Running lspv now shows the correct disks:
lspv
hdisk0 00000000a641877c rootvg
hdisk1 00000000b0a645b0 None
hdisk2 00000000a6274604 None
7. importvg -y testvg hdisk1
8. Mount all the file systems. You may have to run fsck -y on the file
systems first if they were written to while the disk was removed.
Replacing Mirrored Disks
In this section an outline of mirroring is given. Mirroring takes maximum
advantage of the hot swap subsystem.
The hot swap subsystem means that AIX has to be explicitly told about the
removal and addition of disks. In a normal AIX environment the system
would have been shut down and powered-off. The system would then
recognize the removal or addition of disks.Removing and Adding a Mirrored Disk: If you want to remove a disk which
is a mirror of other disks in its volume group, you can either remove the
logical volume ′s copies which are on the disk and follow the procedure
outlined in “Removing a Disk from an Existing Volume Group” on page 141,
or you can remove the disk without telling AIX. If you choose to remove the
disk without telling AIX, the volume group will stay on-line providing that
quorum has been maintained (more than 50% of the disks in the volume
group are still accessible after removing the disk) or if quorum checking has
been turned off. When you re-add the disk, perform the following procedure:
1. Unmount all the file systems which are mirrored on the disk.
2. Change the state of the disk in the volume group to active:
chpv -v -r hdiskx
chpv -v -a hdiskx
3. Synchronize all the partitions on the disk from their mirrors:
syncvg -v VGname or syncvg -p hdiskx
4. Remount all the file systems which are mirrored on the disk.
This will ensure that AIX correctly knows about the disk being re-added and
that all the partitions are correctly synchronized.
If the disk is part of the root volume group and the file systems which are on
the disk cannot be unmounted, then you can either wait for a reboot of the

How To Become an AIX Certified

Congratulations! In the 35th Anniversary Year of the AIX OS, IBM has renewed certification for this remarkable operating system!

You can now (again) take the exam and be proud to be AIX Certified. At the time of writing (June 2021) there is only one exam available: IBM AIX v7 Administrator Specialty

Stay tuned: if and when there are new exams, I will write about them also.

Attention! All information is taken from official sources such as IBM and Peason VUE websites. The author is in no way going to reveal details about specific questions on the exam and warns others against such actions.

Exam S1000-007: IBM AIX v7 Administrator Specialty

  • IBM website: >>>https://www.ibm.com/certify/specialty?id=S1000-007
  • General description: You are an AIX administrator or technical support engineer, performing the full administration cycle of one or more OS instances. You are familiar with networking technologies, have a general understanding of performance tuning and virtualization technologies.
  • Main topics: Planning; AIX and software packages installation and upgrade; System start and shutdown; Basic configuration files; Storage subsystem and LVM; Backup and recovery; Monitoring and configuring the system; Troubleshooting; Basic security and users configuration.
  • Related training courses:
    Power Systems for AIX I: LPAR Configuration and Planning (AN11G),
    Power Systems for AIX II AIX Implementation and Administration (AN12G),
    AIX Network Installation Management Concepts and Configuration (AN22G).
  • Additional knowledge: TCP / IP networks, performance tuning, troubleshooting (basic knowledge and skills).
  • Level: Beginner.
  • Exam lenght: 75 minutes.
  • Number of questions: 46.
  • Passing score: 31.

I can also recommend IBM Redbooks:
IBM eServer Certification Study Guide — pSeries AIX System Administration (sg246191),
IBM Certification Study Guide eServer p5 and pSeries Administration and Support for AIX 5L Version 5.3 (sg247199).
But I warn you that some of the information in them is outdated — pay attention to the year of publication and the OS version.

What do you need to pass the exam?
1. Your desire to become an AIX Certified Professional.
2. Several hours of your time.
3. A certain amount of colored paper or plastic with images of cities, or famous personalities, or flora and fauna — depends on which country you live in.
4. Knowledge of English.
5. Of course, knowledge and experience with the AIX OS.
6. Test center.

Let’s describe it step by step.

1. Desire.
If you have read this far, then you have a desire to pass the exam. Excellent! The most important part is done.

2. Time.
The technical duration of the exam is indicated in the exam description. Plus, add a few days to contact the test center, sign up for an exam, and then get a certificate.

3. Finance.
The author is not able to help you with this question. There are many known legitimate ways to obtain them. How much exactly is needed — check on the website of the test center.

4. English.
Basic knowledge of English. There are no complex constructions in the text of questions and answers; when developing the exam, it is taken into account that not only those who have native English will take it. Experts from different countries, as a rule, participate in the development of exams and, trust me, they also get confused with English tenses, conjugations and articles. By the way, I think, if you can read this text then your English level is more than enough.

5. Knowledge and experience.
Each exam has an official guide, which lists all the topics covered in the test. Use it in advance to estimate how much you know the material.
Read the list of topics: if they do not raise any questions from you, then this is very bad. Usually there are no questions for someone who does not know the subject at all. Joke. Probably.
In general, I hope you understand what I mean: do you think you understand the topic? Excellent! Feel that you are not so good — tighten your knowledge.
Sometimes practice tests exists. Of course, passing such a test will not give any guarantee that you will pass the real exam, but you can roughly estimate your chances.
What about the experience? My point of view that the real experience is the most valuable, but some theory is need to know also.

6. Test center.
Starting point: Pearson VUE website. It is this network of test centers that currently (June 2021) hosts exams on IBM technologies. Select section For test-takers — Schedule an exam. Choose a vendor: IBM. Here is this section of the site right away: https://home.pearsonvue.com/ibm but everything in the web interface can change.
Some useful buttons on the page: Sign In (if you already have an account); Create account (if not). You can create an account yourself, it’s free. The Candidate Testing ID will be assigned, which you will use when ordering exams. Carefully indicate your first and last name in Latin letters — exactly as you write it, it will be on the certificate.
Find a test center: Find the nearest test center. In my opinion, the search works strangely — after specifying the country, I found only a few test centers. I had to indicate a specific city. Choose a test center convenient for you, then contact them directly. Prices for exams should be the same everywhere, but whether a specific center is open on a specific date, the method of payment, how to get into the building — check it out yourself.

What should be done before the exam?
Pay for the test. Check the payment procedure in the test center in advance.
Get some sleep.
Calm down. I said Don’t Panic!
Plan your time so that you can come to the test center in advance. As a rule, at least 15 minutes before the start of the exam. If you are late, then you may be given “missed the exam”, and the prepayment will not be returned. I advise you to clarify the conditions in advance in the test center.
Take two identification documents with you. One of the documents should be with your photo. Driver’s license, passport, bank cards, etc. – check the local requirements in advance.

What should be done at the very beginning of the exam?
Once again: Don’t panic!!!
Carefully check that your Surname and First name are correctly indicated and that this is indeed the exam that you are going to take. No kidding. Everybody has errors and failures.

How does the exam go:
You are put at a computer on which the program with your test is already loaded. The navigation is simple, if something is not clear — ask the staff of the test center. By the way, if the mouse does not work, the screen glares, the chair is broken, or someone sings songs loudly at the next computer — tell the test center employees about this right away; after the exam, such claims will not be accepted. When you check that everything is correct and press the Start Exam button, the countdown will begin. You cannot pause it. You can’t go away “for a couple of minutes”. The exam will automatically end when the allotted time runs out. But if you answered all the questions ahead of schedule and are confident in your answers, you can click the End Exam button yourself.

What not to do during the exam:
As I’ve said, you cannot pause it and you cannot leave the computer on which you are taking the test.
You must not communicate or interfere with other examiners.
You must not to peep questions and answers from other examinees.
You must not to use any documentation or personal notes.
Do not use any electronic devices, including a mobile phone. To avoid misunderstandings (answered the phone call and kicked-off from the exam), I may advise you to turn off your mobile phone or even hand it over to the staff of the test center during the exam, if they provide lockers.
Don’t write down the texts of the questions in order to take them out of the exam. At your request, you will be given paper for notes and a pen, but they will be taken away at the end of the exam.
Don’t ask the staff of the test center for help on questions or to help with the translation of the text.
The test center usually has the right to record the process of passing the exam on video. Some restrictions may apply according to the local laws and regulations.

What you can do:
To take the exam.
To take the exam.
To take the exam.
Contact the employee of the test center if a technical problem has arisen (the monitor turned off, the program hung etc.), if someone or something is interfering, if there is a question about working with the interface.
Yes, that’s, practically, all that is possible.
I almost forgot: you can (and should!) take the exam.

What are the questions on the exam?
If you expect to see examples of real questions here – close this article please, because they are not here and will not be.
I will tell you what types of questions typically can be found on such exams. It’s based on the free practice quiestions availble in the Redbooks.
1. Single Choice — Choose one answer from several options.
Example:
What subject are you taking the exam in now?
[x] Information Technology
[ ] WARP engine researching
[ ] The dark side of the Force
[ ] Magic wand spells

2. Multi Choice — Select multiple answers.
Note that the question clearly states how many options to choose.
Example:
Which two letters of the proposed options are vowels?
[x] A
[x] I
[ ] X
[ ] P
[ ] Z

3. Ordered — You need to arrange all the options in the correct order.
Please note that in this type of question there are no unused answer options and that some of the steps (elements) of the row can be skipped.
For example, as far as I know, the Earth is the third planet from the Sun, but the example of the question below is quite correct.
Example:
Arrange the planets in order of distance from the Sun, starting with the closest one.
[2] Earth
[3] Mars
[1] Mercury
[4] Jupiter

Are there unclearly formulated questions or questions that don’t have a correct answer?
Sure!
 Despite the fact that each question is evaluated several times, discussed and checked by experts, it happens. No one is immune from mistakes.
What if you come across such a question? Read it again first. Perhaps everything is correct, and you have missed some detail. But if, nevertheless, you are sure that you are right, and the developers of the exam are not, then you can leave your comment on this question. If this has a critical impact on the final exam result, then there is the possibility of an appeal.

Some advices for passing the exam.
If you count the number of questions and the total exam time, you get about 1.5-2 minutes per question. This time should be enough to read the question and, if you know the topic, you may guess the correct answer yourself. It remains only to choose it from the proposed list. Don’t know the answer? Read the question and answer options again, more carefully.
No ideas anyway?
Choose some most suitable answer option and postpone this question!

The interface has the option to check a box to return to a specific question.
Don’t waste 10 minutes on one question. During this time, you could answer several other ones.
The passing score is not 100 or 90 percent, keep that in mind and control the time.
Have you answered all the questions, but there is still time?
You can check them again until the exam time is over.
Nobody rushes you.

What happens after the exam ends?
The result will appear on the screen almost immediately after the allotted time ends or you pressed the End Exam button.
You will be given a printout showing how many points you have scored, what passing grade is, how many percent of correct answers you have on the exam topics. You will not be shown which specific questions you answered correctly.
Keep this document — it may come in useful later.
If the final result is PASSED, then in a few days an email will come with a long-awaited and well-deserved certificate in PDF.

Congratulations! Now you can proudly include “IBM Certified” on your CV.

I hope this article helped you. I would be glad to receive your comments or clarifications.
Thank you for your interest in AIX and everything related to the IBM POWER Systems!

#AIX #PowerSystems #Certification #Education #IBM #IBMChampion

With best regards,
Dmitry Mironov
IBM Champion
IBM Certified

Install TS3100 Drvier For AIX

Files in this download contain Licensed Materials, property of IBM,
(C) Copyright IBM Corp. 1998~ All Rights Reserved. See the
Licensing agreement presented when the driver is downloaded.

The readme files contain information that may not be included in the IBM
Tape Device Drivers Installation and User’s Guide (IUG). The readme
files take precedence over the IUG. Therefore, any information in this
file that conflicts with information in the IUG will supercede the IUG.

The IBM Tape Device Drivers for the AIX platforms are responsible for
assisting in AIX host application to tape device communication. The tape
driver for AIX (Atape) works with IBM System Storage Tape Products in
providing basic and advanced tape functionality for backup/restore and
archive environments.

ATTENTION: Tapeutil provides only a subset of device and command
support that the IBM Tape Diagnostic Tool (ITDT) provides. The
functions and capabilities of tapeutil are now performed by ITDT.
Please use ITDT in place of tapeutil, as tapeutil is deprecated.

IBM Tape Diagnostic Tool (ITDT) provides the customer with functionality
to perform maintenance tasks and run diagnostic tasks to determine tape
device issues. This is available to download with the device driver.

“IBM Tape Device Drivers Installation and User’s Guide” describes the
IBM Tape and Medium Changer Device Drivers for AIX, HP-UX, Linux,
Solaris, and Windows operating systems. This is available to download
at IBM Fix Central website: http://www-933.ibm.com/support/fixcentral/

Supported Operating Systems
The tape driver for AIX (Atape) is developed to support various versions
of AIX. For details on supported tape attachment please refer to the
System Storage Interoperation Center website.
http://www.ibm.com/systems/support/storage/config/ssic/

Tape Drivers for AIX
The tape driver for AIX (Atape) is named Atape.n.n.n.n.bin,
where n.n.n.n is the version number of the driver.

Atape levels 12.x.x.x support AIX versions 5.3 and above
Atape levels 11.x.x.x support AIX versions 5.2 and above
Atape levels 10.x.x.x support AIX versions 5.x with Data Encryption
Atape levels  9.x.x.x support AIX versions 5.1 and above
Atape levels  8.x.x.x support AIX versions 4.3, 5.1 and 5.2
Atape levels  6.x.x.x and 7.x.x.x support AIX versions 4.3 and 5.1

Archive drivers are availabe for older AIX Operating Systems.

========================================================================

Atape NOTICES:

Use the following instructions to install and configure it:

a. Download the driver in binary to your workstation. The following
instructions will assume that you named the file
/tmp/Atape.x.x.x.x.bin.
b. Issue the command: installp -acXd /tmp/Atape.x.x.x.bin Atape.driver
This installs and commits the Atape driver on your system.
c. Configure the tape device by entering the following command:
cfgmgr -v (-v is not required but will show where it hangs if
it does)

Notes:

  1. When using Atape 11.x.x.x levels with 3592 and alternate pathing
    it requires 3592-J1A D3I0_A0D or 3592-E05 D3I1_A21 drive code or
    higher.
  2. Serial Attached SCSI (SAS) specification requires the block
    descriptor to be set to a multiple of 4 (0,4,8…). On the update
    of SAS tape drive code for LTO3 Half Height (85V2), LTO4 Full
    Height and LTO4 Half Height to levels 82F0 or beyond, an update
    to latest device driver will be needed. A change to enforce the
    specification in the tape drive causes an adverse reaction with
    the current device driver that will cause the host to not be able
    to communicate with the SAS tape drives.
  3. Booting from a SAS attached tape drive is not supported on
    Power servers.
  4. Booting from a Fibre attached tape drive is not supported on
    Power servers.
  5. DPF and CPF alternate pathing is not supported for SAS tape
    drives and libraries with the FC 5900 and 5912 SAS HBAs.
  6. NPIV attachment support via the FC 5735 Fibre Channel HBA
    (requires VIOS 2.1.3 or later)
  7. No System FW support for SAS boot on Power servers.
  8. User may experience communication interruption with the tape device
    if there are link errors in the SAS interface (No cable pull support
    on SAS adapters)

Maintain your system online!

your system has been using unix for a long timeIn an environment where the operating system is extremely stable and the host hardware operates properlyUNIX systems can operate smoothly and continuously
------------
I'm glad you can come to my websiteThe purpose of my website is very simple and clearI hope that the host/server of every UNIX system can continue to operate smoothlyBut it's not an easy goal

Not easy to achieve does not mean that UNIX is not stable enough
It is the relationship of the entire operating environment. . . .
Inside the system
UNIX operates on the hardware, the system is operating, the relative hardware equipment will have the problem of use loss and use time limit.
Potential crises such as hard disk storage media / such issues usually result in online service interruptions system downtime
Maybe you will think that this is not a problem (it doesn't take much time to restart the system, but the actual situation is that the hard disk is damaged and the system cannot be turned on)
All standard system operations generate many system logs that take up space on your system
When the partition / /var /tmp that keeps the system functioning well runs out of space
The system will start abnormal because there is not enough space to store the temporary files generated during the operation of the system.
This can also cause disruption to online services
A proper system usually does not reboot or shut down once online service is started
But the core components at the hardware level/card (adapter) disk ram also have a service life
They don't need to be corrupted, they just need to be anomalies that cause the data calculated by the system to be wrong
This will expand the problem
Now is the era of big data
Many systems in the manufacturing production line are also responsible for the important use of data preservation.
Once the system is out of service, the accompanying production line also needs to stop production and wait for the system to recover
-------------------------------------------------------

Transactions are built on trust
What can I do for you with my knowledge of UNIX
benefit you and me
I can remotely access your system host through a secure connection
what does it do for you?
Usually I need to check your environment first
In this regard, what is your hardware architecture, what subsystems are there outside the main system (such as external storage devices)
What is the make and model number of your hardware (if this is a very old device I can remind you if the parts are lacking in the event of a disaster)
Then there is the application software on the system. In addition to the old website application software, the host has high security doubts.
It will also make the application of new web technologies infeasible.
The above system information is what the system maintainer needs to master, just like a family doctor, so you need to trust me like a doctor
Let me get information about the operation of the system
Software Name Software Version Hardware Information Network Information
If you can provide me with architectural information, I can control your system environment more efficiently, which is good for you
But I will still conduct an overall system check. If the information you provide is wrong, you can also take this opportunity to correct it.