Wednesday, August 25, 2010

How to migrate Oracle 10gR2 Clusterware and ASM storage devices on AIX

I haven't done a technical post in a while, so I thought I'd share some stuff I'd been working on recently.


It may sometimes be necessary to migrate storage devices being used by a database and Oracle Clusterware.  This may be due to a SAN upgrade or simply moving between storage frames.  AIX provides a native command called migratepv which can handle the movement of logical volume groups and filesystems.  However, for Clusterware (OCR & Voting Disks), and ASM which reside on raw devices, a different method is required.  The ocrconfig and crsctl commands are the preferred methods of working with OCR and voting disk files.  Instructions on migrating AIX volume groups using migratepv  is beyond the scope of this article but you can check out this article by Chris Gibson for additional details http://www.ibm.com/developerworks/aix/library/au-aixstorage/.

This document will describe the steps used to migrate the following types of devices attached to a two-node Oracle Database Server Enterprise Edition 10.2.0.4.0 (64-bit) RAC cluster with ASM on IBM AIX 5.3.  All are raw devices residing on shared storage (SAN).  It is assumed that the storage to which the database will be migrated has been allocated and presented to the servers with the correct permissions.

  • ASM disks
  • OCR disks
  • Voting disks
  • ASM spfile

Note:  It is highly recommended to take backups of your OCR, voting disks and databases using ASM before making any changes.

Automatic Segment Management (ASM)
The benefits of using ASM for database storage becomes tangible in such a scenario i.e. ASM devices can be easily migrated online without downtime[1].  Once the new devices have been presented to the cluster nodes, all that remains is to add them to the existing disk groups and drop the old devices.
It is good practice to verify that the new devices are available on all nodes in the cluster before adding them to a disk group.  If device aliases are used you can easily list the devices.

[oracle@oradba10t oracle]$ ls -lt /dev/asm*
crw-rw----    1 oracle   dba          40, 16 Aug 16 11:16 /dev/asm16
crw-rw----    1 oracle   dba          40, 18 Aug 16 11:16 /dev/asm18
crw-rw----    1 oracle   dba          40, 20 Aug 16 11:16 /dev/asm20
crw-rw----    1 oracle   dba          40, 19 Aug 15 13:00 /dev/asm19
crw-rw----    1 oracle   dba          40, 17 Aug 15 13:00 /dev/asm17
crw-rw----    1 oracle   dba          40, 22 Aug 15 13:00 /dev/asm22
crw-rw----    1 oracle   dba          40, 21 Aug 13 22:00 /dev/asm21
crw-rw----    1 oracle   dba          40, 11 Jul 27 13:11 /dev/asm11
crw-rw----    1 oracle   dba          40,  6 Jul 27 10:22 /dev/asm6
crw-rw----    1 oracle   dba          40,  7 Jul 27 10:19 /dev/asm7
crw-rw----    1 oracle   dba          40, 26 Jul 26 17:25 /dev/asmspf_disk1
crw-rw----    1 oracle   dba          40, 10 Jul 19 17:43 /dev/asm10
crw-rw----    1 oracle   dba          40, 14 Jul 12 16:50 /dev/asm14
crw-rw----    1 oracle   dba          40, 13 Jul 12 16:35 /dev/asm13
crw-rw----    1 oracle   dba          40, 12 Jul 12 15:59 /dev/asm12
crw-rw----    1 oracle   dba          40,  8 Jul 12 15:26 /dev/asm8
crw-rw----    1 oracle   dba          40,  9 Jul 08 23:02 /dev/asm9
crw-rw----    1 oracle   dba          40, 23 Jul 08 09:47 /dev/asm23
crw-rw----    1 oracle   dba          40, 24 Jul 08 09:47 /dev/asm24
crw-rw----    1 oracle   dba          40, 25 Jul 08 09:47 /dev/asm25
crw-rw----    1 oracle   dba          40,  5 Jun 17 14:18 /dev/asmspf_disk
crw-rw----    1 oracle   dba          40, 15 Jul 02 2009  /dev/asm15

You can call me paranoid, but I also like to re-check the reserve lock/policy settings on the devices.  It should be set to no or off depending on the storage being used.  I have had bad experiences with ASM devices that have their reserve locks set.  The commands below show the device name, lun id, reserve lock setting and pvid settings for devices hdisks 16-25.

[oracle@oradba10t oracle]$ for i in 16 17 18 19 20 21 22 23 24 25
> do
> lsattr -El hdiskpower$i | grep reserve_lock | awk '{print $1,$2 }'| read rp1 rp2
> lsattr -El hdiskpower$i | grep pvid | awk '{print $1,$2 }'| read pv1 pv2
> lsattr -El hdiskpower$i | grep lun_id | awk '{print $1,$2 }'| read li1 li2
> if [ "$li1" != "" ]
> then
> echo hdiskpower$i' -> '$li1' = '$li2' / '$rp1' = '$rp2' / '$pv1' = '$pv2
> fi
> done
hdiskpower16 -> lun_id = 0x1000000000000 / reserve_lock = no / pvid = none
hdiskpower17 -> lun_id = 0x3000000000000 / reserve_lock = no / pvid = none
hdiskpower18 -> lun_id = 0x4000000000000 / reserve_lock = no / pvid = none
hdiskpower19 -> lun_id = 0x5000000000000 / reserve_lock = no / pvid = none
hdiskpower20 -> lun_id = 0x6000000000000 / reserve_lock = no / pvid = none
hdiskpower21 -> lun_id = 0x7000000000000 / reserve_lock = no / pvid = none
hdiskpower22 -> lun_id = 0x8000000000000 / reserve_lock = no / pvid = none
hdiskpower23 -> lun_id = 0x9000000000000 / reserve_lock = no / pvid = none
hdiskpower24 -> lun_id = 0xa000000000000 / reserve_lock = no / pvid = none
hdiskpower25 -> lun_id = 0xb000000000000 / reserve_lock = no / pvid = none

In the case of a migration there are typically numerous devices that need to be moved. Depending on your LUN sizes and the number of devices this could take a long time due to the amount of rebalancing that has to be done.  In order to reduce the amount of rebalancing operations you can add all the disks with a rebalance power of 0 which effectively disables rebalancing.

SQL> alter diskgroup dg_data add disk '/dev/asm20','/dev/asm21','/dev/asm22' rebalance power 0;

Diskgroup altered.

SQL> select * from v$asm_operation;

no rows selected

Once the new disks have been added you can then drop the old disks except the last one with a rebalance power of 0.  Once the disk is dropped with a rebalance power greater than 0, rebalancing will begin immediately.

Oracle Cluster Registry (OCR)
The OCR stores the Clusterware configuration.  In a typical configuration the OCR is mirrored 2-way.  Clusterware automatically creates backups of the OCR at periodic intervals.  See the Oracle documentation for more details. http://download.oracle.com/docs/cd/B19306_01/rac.102/b14197/votocr.htm#CIHFBJEF

The ocrconfig command can be used to administer the OCR configuration, including adding/removing and replacing OCR disks and mirrors.  The ocrconfig command should be run as root.

Check if there are recent backups of the OCR files.

[root@oradba10t /]$ /u01/app/oracle/product/10/crs/bin/ocrconfig -showbackup

oradba11t     2010/07/29 09:09:52     /u01/app/oracle/product/10/crs/cdata/crs10g

oradba11t     2010/07/29 05:09:51     /u01/app/oracle/product/10/crs/cdata/crs10g

oradba11t     2010/07/29 01:09:50     /u01/app/oracle/product/10/crs/cdata/crs10g

oradba11t     2010/07/27 17:09:45     /u01/app/oracle/product/10/crs/cdata/crs10g

oradba11t     2010/07/22 09:09:24     /u01/app/oracle/product/10/crs/cdata/crs10g

The showbackup command will list the hostname, path and backup date and time for each backup.
Note: All backups may not be on the same server.

[oracle@oradba11t dbs]$ ls -ltr /u01/app/oracle/product/10/crs/cdata/crs10g
total 84792
-rw-r--r--    1 root     system      6201344 Jul 22 09:09 week.ocr
-rw-r--r--    1 root     system      6201344 Jul 27 17:09 day.ocr
-rw-r--r--    1 root     system      6201344 Jul 28 17:09 day_.ocr
-rw-r--r--    1 root     system      6201344 Jul 29 05:09 backup02.ocr
-rw-r--r--    1 root     system      6201344 Jul 29 09:09 week_.ocr
-rw-r--r--    1 root     system      6201344 Jul 29 09:09 backup01.ocr
-rw-r--r--    1 root     system      6201344 Jul 29 13:09 backup00.ocr

If a recent OCR backup does not exist you should export the online OCR file.

[root@oradba10t /]$ /u01/app/oracle/product/10/crs/bin/ocrconfig -export /u02/backup/ocr_backup_07292010 -s online

[root@oradba10t /]$ ls -ltr /u02/backup/ocr*
-rw-r--r--    1 root     system       143260 Jul 29 13:29 /u02/backup/ocr_backup_07292010

Verify the integrity and status the current OCR files by using ocrcheck.

[root@oradba10t /]$ /u01/app/oracle/product/10/crs/bin/ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :    1055724
         Used space (kbytes)      :       5656
         Available space (kbytes) :    1050068
         ID                       : 2023180143
         Device/File Name         : /dev/ocr_disk1
                                    Device/File integrity check succeeded
         Device/File Name         : /dev/ocr_disk2
                                    Device/File integrity check succeeded

         Cluster registry integrity check succeeded

Check if the new OCR devices are visible from all nodes.  Using aliases makes this much simpler.  If your devices don’t have aliases check with your system administrator for the correct device names.

[root@oradba10t /]$ ls -ltr /dev/ocr*
crw-r-----    1 oracle   dba          40,  0 Jun 29 2009  /dev/ocr_disk1
crw-r-----    1 oracle   dba          40,  1 Jun 29 2009  /dev/ocr_disk2
crw-r-----    1 root     dba          40, 27 Jul 29 10:12 /dev/ocr_disk3
crw-r-----    1 root     dba          40, 28 Jul 29 10:12 /dev/ocr_disk4

Check the permissions on the new devices are the same as the existing devices i.e. OCR should be owned by root and the dba group with 640 permissions.  This should be the same on both nodes.
Go to the $CRS_HOME/bin directory and run ocrconfig from one node as root.

[root@oradba10t bin]$ ./ocrconfig -replace ocr /dev/ocr_disk3
[root@oradba10t bin]$ ./ocrconfig -replace ocrmirror /dev/ocr_disk4

Run ocrcheck on all nodes to verify the cluster integrity
[root@oradba10t bin]$ ./ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :    1055724
         Used space (kbytes)      :       5656
         Available space (kbytes) :    1050068
         ID                       : 2023180143
         Device/File Name         : /dev/ocr_disk3
                                    Device/File integrity check succeeded
         Device/File Name         : /dev/ocr_disk4
                                    Device/File integrity check succeeded

         Cluster registry integrity check succeeded

Check the logs for errors under $CRS_HOME/log/hostname/client
ocrconfig generates log files similar to ocrconfig_xxxxxx.log
ocrcheck generates log files like ocrcheck_xxxxxx.log
Voting disks
Shutdown the database and stop CRS on both nodes before making modifications to the voting disks.  It is also recommended to backup your voting disks.
[root@oradba10t /]$ /u01/app/oracle/product/10/crs/bin/crsctl stop crs
Stopping resources. This could take several minutes.
Successfully stopped CRS resources.
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.

[root@oradba11t /]$ /u01/app/oracle/product/10/crs/bin/crsctl stop crs
Stopping resources. This could take several minutes.
Successfully stopped CRS resources.
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.

Backup the voting disks. (This could take a long time).

[root@oradba10t client]$ dd if=/dev/voting_disk1 of=/u02/backup/voting_disk1_backup_07292010
2112000+0 records in.
2112000+0 records out.
[root@oradba10t /]$ dd if=/dev/voting_disk2 of=/u02/backup/voting_disk2_backup_07292010
2112000+0 records in.
2112000+0 records out.
[root@oradba10t /]$ dd if=/dev/voting_disk3 of=/u02/backup/voting_disk3_backup_07292010
2112000+0 records in.
2112000+0 records out.

Check your permissions on the new devices.  Voting disks should be owned by oracle user and dba group with 644 permissions.
As root issue crsctl commands from one node.

[root@oradba10t /]$ cd /u01/app/oracle/product/10/crs/bin
[root@oradba10t bin]$ ./crsctl delete css votedisk /dev/voting_disk1 -force
successful deletion of votedisk /dev/voting_disk1.
[root@oradba10t bin]$ ./crsctl add css votedisk /dev/voting_disk4 -force
Now formatting voting disk: /dev/voting_disk4
successful addition of votedisk /dev/voting_disk4.
[root@oradba10t bin]$ ./crsctl delete css votedisk /dev/voting_disk2 -force
successful deletion of votedisk /dev/voting_disk2.
[root@oradba10t bin]$ ./crsctl delete css votedisk /dev/voting_disk3 -force
successful deletion of votedisk /dev/voting_disk3.
[root@oradba10t bin]$ ./crsctl add css votedisk /dev/voting_disk5 -force
Now formatting voting disk: /dev/voting_disk5
successful addition of votedisk /dev/voting_disk5.
[root@oradba10t bin]$ ./crsctl add css votedisk /dev/voting_disk6 -force
Now formatting voting disk: /dev/voting_disk6
successful addition of votedisk /dev/voting_disk6.

Check the status of the voting disks on both nodes using crsctl

[root@oradba10t bin]$ ./crsctl query css votedisk
 0.     0    /dev/voting_disk4
 1.     0    /dev/voting_disk5
 2.     0    /dev/voting_disk6
located 3 votedisk(s).

[root@oradba11t /]$ /u01/app/oracle/product/10/crs/bin/crsctl query css votedisk
 0.     0    /dev/voting_disk4
 1.     0    /dev/voting_disk5
 2.     0    /dev/voting_disk6

located 3 votedisk(s).

Start CRS on both nodes as root

[root@oradba10t bin]$ ./crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly

Wait about 1 minute before checking the status.  Sometimes it takes longer.
[root@oradba10t bin]$ ./crsctl check crs
CSS appears healthy
Cannot communicate with CRS
EVM appears healthy

If you encounter any issues you can check the cluster alert log under
$CRS_Home/log/oradba10t/alertnodename.log to make sure the voting disks are online
[cssd(1278060)]CRS-1605:CSSD voting file is online: /dev/voting_disk4. Details in /u01/app/oracle/product/10/crs/log/oradba10t/cssd/ocssd.log.
2010-07-29 16:38:26.173
[cssd(1278060)]CRS-1605:CSSD voting file is online: /dev/voting_disk5. Details in /u01/app/oracle/product/10/crs/log/oradba10t/cssd/ocssd.log.
2010-07-29 16:38:26.176
[cssd(1278060)]CRS-1605:CSSD voting file is online: /dev/voting_disk6. Details in /u01/app/oracle/product/10/crs/log/oradba10t/cssd/ocssd.log.

Repeat the procedure on other nodes and verify the status of the clusterware resources.

[root@oradba10t oradba10t]$ /u01/app/oracle/product/10/crs/bin/crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora....vice.cs application    ONLINE    ONLINE    oradba10t
ora....ac1.srv application    ONLINE    ONLINE    oradba10t
ora.labrac.db  application    ONLINE    ONLINE    oradba10t
ora....c1.inst application    ONLINE    ONLINE    oradba10t
ora....c2.inst application    ONLINE    ONLINE    oradba11t
ora....asic.cs application    ONLINE    ONLINE    oradba10t
ora....ac1.srv application    ONLINE    ONLINE    oradba10t
ora....ac2.srv application    ONLINE    ONLINE    oradba11t
ora...._taf.cs application    ONLINE    ONLINE    oradba10t
ora....ac1.srv application    ONLINE    ONLINE    oradba10t
ora....SM1.asm application    ONLINE    ONLINE    oradba10t
ora....0T.lsnr application    ONLINE    ONLINE    oradba10t
ora....10t.gsd application    ONLINE    ONLINE    oradba10t
ora....10t.ons application    ONLINE    ONLINE    oradba10t
ora....10t.vip application    ONLINE    ONLINE    oradba10t
ora....SM2.asm application    ONLINE    ONLINE    oradba11t
ora....1T.lsnr application    ONLINE    ONLINE    oradba11t
ora....11t.gsd application    ONLINE    ONLINE    oradba11t
ora....11t.ons application    ONLINE    ONLINE    oradba11t
ora....11t.vip application    ONLINE    ONLINE    oradba11t


ASM Spfile

The ASM spfile can either be on local or shared device.  If it is on local device then migratepv can be used to move it to new devices.  If it’s the latter then you can move the spfile using the steps described below.

Each ASM instance in a cluster will have its own initialization parameter file located under $ORA_ASM_HOME/dbs.  The contents of this file will only have one entry pointing to the spfile on the shared device.

oracle@tibora10d[+ASM1]-/u01/oracle/asm/dbs >cat init+ASM1.ora
SPFILE='/dev/asm_spfile1'

Export the ASM instance SID and ASM Oracle Home on each server.

export ORACLE_SID=+ASM1
export ORACLE_HOME=/u01/app/oracle/product/10.2.0/asm

Connect to your ASM instance and display the current spfile location
oracle@oradba10t oradba10t]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Mon Jul 26 15:31:39 2010

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL> show parameter spfile

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      /dev/asmspf_disk

Check your new and existing ASM spfile devices to verify the correct permissions have been set.
[oracle@oradba10t dbs]$ ls -l /dev/asmspf*
crw-rw----    1 oracle   dba          40,  5 Jun 17 14:18 /dev/asmspf_disk
crw-rw----    1 oracle   dba          40, 26 Jul 08 09:53 /dev/asmspf_disk1

First create a pfile for your ASM instance on each node
SQL> create pfile='initASM1pfile.ora' from spfile;

Next shutdown your ASM instance and startup in nomount with pfile
[oracle@oradba10t dbs]$ srvctl stop asm -n oradba10t
[oracle@oradba10t dbs]$ srvctl stop asm -n oradba11t

SQL> startup nomount pfile='initASM1pfile.ora';
ASM instance started

Total System Global Area  130023424 bytes
Fixed Size                  2082312 bytes
Variable Size             102775288 bytes
ASM Cache                  25165824 bytes

From one node only create a spfile on the shared device

SQL> create spfile='/dev/asmspf_disk1' from pfile='initASM1pfile.ora';

File created.

Shutdown the ASM instance
SQL> shutdown immediate;
ORA-15100: invalid or missing diskgroup name


ASM instance shutdown

You can ignore the ORA-15100 error message.  This is due to the instance being in nomount mode and the diskgroups are not mounted.

Go to you $ORA_ASM_HOME/dbs and edit your init+ASM.ora file to point to the new device.
[oracle@oradba10t dbs]$ vi init+ASM1.ora
"init+ASM1.ora" 1 line, 26 characters
SPFILE='/dev/asmspf_disk1'

Connect to your ASM instance using SQL Plus and start it up.
[oracle@oradba10t dbs]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Mon Jul 26 17:27:43 2010

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

Connected to an idle instance.

SQL> startup
ASM instance started

Total System Global Area  130023424 bytes
Fixed Size                  2082312 bytes
Variable Size             102775288 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted
Verify that the spfile parameter
SQL> show parameter spfile

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      /dev/asmspf_disk1
SQL>

Once the ASM instances startup using the new spfile you can update the OCR ASM registry to have the new location for use with srvctl.

Shutdown the ASM instance on both nodes and modify the OCR ASM registry with the new spfile location.


srvctl modify asm -n oradba10t -i +ASM1 -o /u01/app/oracle/product/10/asm -p /dev/asmspf_disk1
srvctl modify asm -n oradba11t -i +ASM2 -o /u01/app/oracle/product/10/asm -p /dev/asmspf_disk1

Startup and check the status each ASM instance using srvctl.
[oracle@oradba10t dbs]$ srvctl start asm -n oradba10t
[oracle@oradba10t dbs]$ srvctl status asm -n oradba10t
ASM instance +ASM1 is running on node ocrradba10t.
[oracle@oradba10t dbs]$ srvctl start asm -n oradba11t
[oracle@oradba10t dbs]$ srvctl status asm -n oradba11t
ASM instance +ASM2 is running on node oradba11t.

References

Moving ASM spfile to a shared device in RAC [ID 973031.1]

OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE), including moving from RAW Devices to Block Devices. [ID 428681.1]

Oracle® Database Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide
10g Release 2 (10.2)


[1] Downtime is still required for reclaiming devices due to bugs in 10.2.0.4 which causes the ASM file descriptors to remain open even after the devices are dropped from the disk groups. See bug 4693355, 7225720.

1 comment:

  1. Very nice Instruction. Thank You!

    ReplyDelete

IOUG Collaborate 14

IOUG Collaborate 14
IOUG Collaborate 14 Las Vegas