Wednesday, June 22, 2011

Troubleshooting RAC Public Network Failure

Here are some steps I used to troubleshoot the failure of a public network used for SCAN in a 2-node RAC cluster.
Note: I used an aliased crsstat  for the command: crsctl stat res –t
Check the status of the Clusterware resources. You can see that there are several resources that are offline below. The ones highlighted in red are the ones we are interested in. The resource for the local listener is now offline while the VIP has been failed over. You will also notice that all of the SCAN Listeners have been failed over to the surviving node.
grid@tibora30[+ASM1]-/u01/11.2.0/grid/log/tibora30 >crsstat
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DG_DATA.dg
               ONLINE  ONLINE       tibora30
               ONLINE  ONLINE       tibora31
ora.DG_FLASH.dg
               ONLINE  ONLINE       tibora30
               ONLINE  ONLINE       tibora31
ora.LISTENER.lsnr
               ONLINE  OFFLINE      tibora30
               ONLINE  ONLINE       tibora31
ora.asm
               ONLINE  ONLINE       tibora30                 Started
               ONLINE  ONLINE       tibora31                 Started
ora.gsd
               OFFLINE OFFLINE      tibora30
               OFFLINE OFFLINE      tibora31
ora.net1.network
               ONLINE  ONLINE       tibora30
               ONLINE  ONLINE       tibora31
ora.ons
               ONLINE  ONLINE       tibora30
               ONLINE  OFFLINE      tibora31
ora.registry.acfs
               ONLINE  ONLINE       tibora30
               ONLINE  ONLINE       tibora31
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       tibora31
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       tibora31
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       tibora31
ora.cvu
      1        ONLINE  OFFLINE
ora.oc4j
      1        ONLINE  ONLINE       tibora31
ora.scan1.vip
      1        ONLINE  ONLINE       tibora30
ora.scan2.vip
      1        ONLINE  ONLINE       tibora31
ora.scan3.vip
      1        ONLINE  ONLINE       tibora31
ora.tibora30.vip
      1        ONLINE  INTERMEDIATE tibora31                 FAILED OVER
ora.tibora31.vip
      1        ONLINE  ONLINE       tibora31
ora.tibprd.db
      1        ONLINE  ONLINE       tibora30                 Open
      2        ONLINE  ONLINE       tibora31                 Open
ora.tibprd.tibprd_applog.svc
      1        ONLINE  ONLINE       tibora31
ora.tibprd.tibprd_basic.svc
      1        ONLINE  ONLINE       tibora31
ora.tibprd.tibprd_smap.svc
      1        ONLINE  ONLINE       tibora31
Then look at the CRS logs under $GI_HOME/log/hostname/alerthostname.log  for entries similar to the ones below:
2011-06-21 09:43:57.844
[/u01/11.2.0/grid/bin/orarootagent.bin(21168162)]CRS-5818:Aborted command 'check for resource: ora.net1.network tibora30 1' for resource 'ora.net1.network'. Details at (:CRSAGF00113:) {0:9:2} in /u01/11.2.0/grid/log/tibora30/agent/crsd/orarootagent_root/orarootagent_root.log.
2011-06-21 09:44:00.459
[/u01/11.2.0/grid/bin/oraagent.bin(22413372)]CRS-5016:Process "/u01/11.2.0/grid/opmn/bin/onsctli" spawned by agent "/u01/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/11.2.0/grid/log/tibora30/agent/crsd/oraagent_grid/oraagent_grid.log"
2011-06-21 09:44:01.112
[/u01/11.2.0/grid/bin/oraagent.bin(22413372)]CRS-5016:Process "/u01/11.2.0/grid/bin/lsnrctl" spawned by agent "/u01/11.2.0/grid/bin/oraagent.bin" for action "
check" failed: details at "(:CLSN00010:)" in "/u01/11.2.0/grid/log/tibora30/agent/crsd/oraagent_grid/oraagent_grid.log"
2011-06-21 09:44:01.180
[/u01/11.2.0/grid/bin/oraagent.bin(22413372)]CRS-5016:Process "/u01/11.2.0/grid/bin/lsnrctl" spawned by agent "/u01/11.2.0/grid/bin/oraagent.bin" for action "
check" failed: details at "(:CLSN00010:)" in "/u01/11.2.0/grid/log/tibora30/agent/crsd/oraagent_grid/oraagent_grid.log"
Check the status of the VIP on the node
grid@tibora30[+ASM1]-/u01/11.2.0/grid/log/tibora30 >srvctl status vip -n tibora30
VIP tibora30-vip is enabled
VIP tibora30-vip is not running
Also check the status of the SCAN resources.
grid@tibora30[+ASM1]-/u01/11.2.0/grid/log/tibora30 >srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is running on node tibora30
SCAN VIP scan2 is enabled
SCAN VIP scan2 is running on node tibora31
SCAN VIP scan3 is enabled
SCAN VIP scan3 is running on node tibora31
In this particular case the SCAN VIPs were running on both nodes. It turns out that on another cluster which also experienced a network failure the SCAN VIPs were all running on node.
First we need to start the local listener:
grid@tibora30[+ASM1]-/u01/11.2.0/grid/log/tibora30 >srvctl start listener
Now check the status of the resources.
grid@tibora30[+ASM1]-/u01/11.2.0/grid/log/tibora30 >crsstat
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DG_DATA.dg
               ONLINE  ONLINE       tibora30
               ONLINE  ONLINE       tibora31
ora.DG_FLASH.dg
               ONLINE  ONLINE       tibora30
               ONLINE  ONLINE       tibora31
ora.LISTENER.lsnr
               ONLINE  ONLINE       tibora30
               ONLINE  ONLINE       tibora31
ora.asm
               ONLINE  ONLINE       tibora30                 Started
               ONLINE  ONLINE       tibora31                 Started
ora.gsd
               OFFLINE OFFLINE      tibora30
               OFFLINE OFFLINE      tibora31
ora.net1.network
               ONLINE  ONLINE       tibora30
               ONLINE  ONLINE       tibora31
ora.ons
               ONLINE  ONLINE       tibora30
               ONLINE  OFFLINE      tibora31
ora.registry.acfs
               ONLINE  ONLINE       tibora30
               ONLINE  ONLINE       tibora31
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       tibora31
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       tibora31
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       tibora31
ora.cvu
      1        ONLINE  OFFLINE
ora.oc4j
      1        ONLINE  ONLINE       tibora31
ora.scan1.vip
      1        ONLINE  ONLINE       tibora30
ora.scan2.vip
      1        ONLINE  ONLINE       tibora31
ora.scan3.vip
      1        ONLINE  ONLINE       tibora31
ora.tibora30.vip
      1        ONLINE  ONLINE       tibora30
ora.tibora31.vip
      1        ONLINE  ONLINE       tibora31
ora.tibprd.db
      1        ONLINE  ONLINE       tibora30                 Open
      2        ONLINE  ONLINE       tibora31                 Open
ora.tibprd.tibprd_applog.svc
      1        ONLINE  ONLINE       tibora31
ora.tibprd.tibprd_basic.svc
      1        ONLINE  ONLINE       tibora31
ora.tibprd.tibprd_smap.svc
      1        ONLINE  ONLINE       tibora31

Starting the local listener also caused the VIP to relocate to the previous node.
In another situation I had to manually relocate the VIP to the original node.
Next we need to check our nodeapps including ONS.


grid@tibora30[+ASM1]-/u01/11.2.0/grid/log/tibora30 >srvctl status nodeapps

VIP tibora30-vip is enabled

VIP tibora30-vip is running on node: tibora30

VIP tibora31-vip is enabled

VIP tibora31-vip is running on node: tibora31

Network is enabled

Network is running on node: tibora30

Network is running on node: tibora31

GSD is disabled

GSD is not running on node: tibora30

GSD is not running on node: tibora31

ONS is enabled

ONS daemon is running on node: tibora30

ONS daemon is not running on node: tibora31


Here you can see that the ONS daemon is not running on the tibora31.
To start the ONS daemon issue the following command:
grid@tibora30[+ASM1]-/u01/11.2.0/grid/log/tibora30 >srvctl start nodeapps -n tibora31

PRKO-2421 : Network resource is already started on node(s): tibora31

PRKO-2420 : VIP is already started on node(s): tibora31
 Check the status of the nodeapps again
grid@tibora30[+ASM1]-/u01/11.2.0/grid/log/tibora30 >srvctl status nodeapps
VIP tibora30-vip is enabled
VIP tibora30-vip is running on node: tibora30
VIP tibora31-vip is enabled
VIP tibora31-vip is running on node: tibora31
Network is enabled
Network is running on node: tibora30
Network is running on node: tibora31
GSD is disabled
GSD is not running on node: tibora30
GSD is not running on node: tibora31
ONS is enabled
ONS daemon is running on node: tibora30
ONS daemon is running on node: tibora31
The CVU resource can be started as follows:
grid@tibora30[+ASM1]-/u01/11.2.0/grid/log/tibora30 >srvctl start cvu -n tibora30
You can now verify connectivity to the database/services. I prefer to use SQL Developer to test connectivity to my databases with one connection for each service name.

All the SCAN listeners were running on a single node. At least one needed to be relocated to service the requests coming from the SCAN on that node.
grid@tibora31[+ASM2]-/home/grid >srvctl relocate scan_listener -i 1 -n tibora30
grid@tibora31[+ASM2]-/home/grid >srvctl status scan_listener
SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is running on node tibora30
SCAN Listener LISTENER_SCAN2 is enabled
SCAN listener LISTENER_SCAN2 is running on node tibora31
SCAN Listener LISTENER_SCAN3 is enabled
SCAN listener LISTENER_SCAN3 is running on node tibora31
Once the SCAN was relocated the application connected successfully.

Wednesday, June 15, 2011

Cloudy With a Chance of Database: Part I

There has been a lot of talk in the ubiquitous cloud space recently and more specifically the PaaS service delivery model. Naturally this piqued my curiosity. How does the database fit in the cloud?
Just about every major enterprise vendor is jumping on the cloud bandwagon. And rightfully so. Private clouds are now being pushed by Oracle with Exalogic Elastic Cloud and Exadata, IBM with SmartCloud and VMware with Cloud Foundry just to mention a few. Amazon AWS and Rackspace are the major players in the public cloud stratosphere. Add to this community Clouds like OpenStack

The easiest to do an evaluation is by using Amazon AWS public cloud services. Amazon AWS provides Relational Database Service (RDS) for MySQL and Oracle Database 11g. Strictly speaking EC2 is a Infrastructure as a Service (IaaS) delivery model. However, the use of AMIs enable PaaS characteristics where complete environments can be easily packaged and deployed in EC2.
There are several publicly available Oracle Database AMIs on Amazon AWS. These are fully-supported and licensable environments. Setting up an EC2 instance involves several steps. First you need to sign up for an AWS EC2 and RDS account.

AWS EC2 instance (large) create 11.2.0.1 on EL5.5
Once you log into EC2 you are prompted with Terms of Use and license Agreements for OTN license and EL license. 
Here is a viewlet of creating an EC2 instance from AWS Console.
Unable to display content. Adobe Flash is required.


Key Features of EC2
  • Database configuration can be customized
  • Full control over options and parameters
  • Remote login access
  • Monitoring using Enterprise Manager database control/grid control
  • APIs and CLI tools to manage instances
Caveats
  • Scaling requires rebuilding or adding EC2 instances
  • No RAC support
  • No ASM support
AWS RDS Oracle DB Instance
The AWS RDS services allows you to either Bring your own license (BYOL) or pay-per use for Oracle Standard Edition One. Oracle Standard Edition and Enterprise Editions supports the BYOL licensing model. Similar to the EC2 instances they come in different levels of resource combinations: small, medium, large etc.
Launch an Oracle DB instance in RDS is similar to a MySQL DB instance.
Here is the list of available AWS RDS instances.


The AWS RDS instance creates a user with limited privileges. A lot of security considerations has been taken into account. They are not totally secure however as one fellow blogger/tweeter points out: Sydney Oracle Lab: Locked down in Oracle Amazon RDS. Here are few of the roles and privileges granted to the admin user.


iMac:~ leighton$ sqlplus orardsuser/*******@orards.c2rlslv1ais6.us-east-1.rds.amazonaws.com:1521/orards
SQL*Plus: Release 10.2.0.4.0 - Production on Wed May 25 16:13:27 2011
Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

Connected to:
Oracle Database 11g Release 11.2.0.2.0 - 64bit Production
SQL>

SQL> select dbms_metadata.get_granted_ddl('ROLE_GRANT', user)
from dual;  2
   GRANT "CONNECT" TO "ORARDSUSER" WITH ADMIN OPTION
   GRANT "DBA" TO "ORARDSUSER" WITH ADMIN OPTION

SQL> select dbms_metadata.get_granted_ddl('SYSTEM_GRANT', user)
from dual  2  ;
  GRANT UNLIMITED TABLESPACE TO "ORARDSUSER" WITH ADMIN OPTION

SQL> select dbms_metadata.get_granted_ddl('OBJECT_GRANT', user)
from dual;  2
  GRANT EXECUTE ON "SYS"."DBMS_LOCK" TO "ORARDSUSER"
  GRANT EXECUTE ON "SYS"."DBMS_PIPE" TO "ORARDSUSER"
  GRANT EXECUTE ON "SYS"."DBMS_SHARED_POOL" TO "ORARDSUSER"
A lot of the DBA related tasks can be accomplished by the use of wrapper scripts. This limits the potential for unwary users to do something dangerous to the database.
The Amazon AWS RDS User Guide has a lot of information about this and other features.


Monitoring an RDS instance requires the CloudWatch monitoring service. The downside of this is that it does not have the level of detail and functionality provided with Enterprise Manager.
Tools like SQL Developer can be used to connect to and manage your database just like any other Oracle instance.

View Sessions through SQL Developer
Manage AWS RDS for Oracle with SQL Developer

Default Oracle RDS users for Standard Edition One

Key features
  • Provision a fully functional database in a few clicks
  • Backups are automatically setup
  • Automatic minor version upgrade
  • No System Admin required
  • No up-front costs
  • Can easily scale without rebuilding instances
  • APIs and CLIs provided to manage instances
Caveats
  • No remote login access
  • No Enterprise Manager database control monitoring capability
  • No replication (to be supported in the future)
  • No RAC Support
  • No ASM Support
  • APEX not installed/supported
  • On-demand licensing for Standard Edition One only

As you can see an RDS instance will provide simple and rapid deployment for databases for developers and users without requiring significant database administration experience. Databases launched on EC2 however provide a lot more flexibility and hence requires DBAs to perform a lot of the tasks in setting up an instance. In both cases deployment is faster than a standard deployment lifecycle in a local data center.

In part II of the series I will go into more details of setup and administration of AWS EC2 and RDS databases using the CLI tools. I hope to also briefly cover the licensing costs which is a major factor in determining whether or not you should run your Oracle database in the cloud.

    Tuesday, June 7, 2011

    ORA-00020: maximum number of processes (%s) exceeded

    Ever received this error message in your alert log? I have on a couple of occasions. Perhaps an application server started spawning more database connections than it normally does in turn creating more processes on the database. The Oracle description of this error is:
    Error: ORA 20 Text: maximum number of processes exceeded  
    Cause: An operation requested a resource that was unavailable. The maximum number of processes is specified by the initialization parameter PROCESSES. When this maximum is reached, no more requests are processed. Action: Try the operation again in a few minutes. If this message occurs often, shut down Oracle, increase the PROCESSES parameter in the initialization parameter file, and restart Oracle.

    But what if you cannot connect to your database to shut it down and increase the parameter?

    oradba01t[labdb01]-/home/oracle/>sqlplus  /as sysdba
    SQL*Plus: Release 11.1.0.7.0 - Production on Mon Jun 6 11:25:05 2011
    Copyright (c) 1982, 2008, Oracle.  All rights reserved.
    ERROR:
    ORA-00020: maximum number of processes (%s) exceeded

    I always like to look at the alert log to check for any additional details.

    Using adrci I tailed the alert log and sure enough we had our ORA-00020 error all over.

    2011-06-06 11:17:36.042000 -05:00
    ORA-00020: No more process state objects available
    ORA-20 errors will not be written to the alert log for
     the next minute. Please look at trace files to see all
     the ORA-20 errors.
    Process m000 submission failed with error = 20
    2011-06-06 11:24:35.878000 -05:00
    ORA-00020: maximum number of processes 0 exceeded
    ORA-20 errors will not be written to the alert log for
     the next minute. Please look at trace files to see all
     the ORA-20 errors.
    adrci> exit

    How do I connect to the database to increase my processes parameter? Well I remember Tanel Poder blogged about "How to log on even when SYSDBA can't do so?"

    oradba01t[labdb01]-/home/oracle/>sqlplus -prelim "/as sysdba"

    SQL*Plus: Release 11.1.0.7.0 - Production on Mon Jun 6 11:29:54 2011

    Copyright (c) 1982, 2008, Oracle.  All rights reserved.

    SQL>

    I was able to log on to the instance without getting the error message. Now to see if I can shut it down and start it up to increase the parameter.

    SQL> shutdown abort
    ORACLE instance shut down.
    SQL> startup mount
    ORACLE instance started.
    ORA-01012: not logged on
    Process ID: 0
    Session ID: 0 Serial number: 0

    It appears that the shutdown abort was successful but the mount command did not complete. Disconnect from the previously connected session and logon normally to start the instance.

    SQL>exit
    oradba01t[labdb01]-/home/oracle>sqlplus / as sysdba

    SQL*Plus: Release 11.1.0.7.0 - Production on Mon Jun 6 11:33:07 2011

    Copyright (c) 1982, 2008, Oracle.  All rights reserved.

    Connected to an idle instance.

    SQL>startup mount
    ORACLE instance started.

    Total System Global Area 4175568896 bytes
    Fixed Size                  2160352 bytes
    Variable Size            3489663264 bytes
    Database Buffers          671088640 bytes
    Redo Buffers               12656640 bytes
    Database mounted.
    SQL>

    Now we can go ahead and increase our processes parameter. Actually I opted not to increase the parameter since I knew the root cause of the processes being exceeded. This was a Enterprise Manager Grid Control repository and due to some OMS processes had caused the database processes to be exceeded.