Archive for the ‘highAvailability’ Category.

Transportable Tablespace from rman backup

Most of oracle’s MAA documentation for transportable tablespaces, seems to recommend that one should put the tablespaces one is transporting, in read-only mode in the source database, before copying the datafiles to the target. This in most cases means application downtime.

In order to minimize the downtime the recommendations seem to be

  • Create a dataguard physical standby database and use this standby database as the source for transport
  • Create a duplicate (aka clone) of your source and use this new duplicate as the source for transport

If you are only transporting a subset of your tablespaces and you want to minimize your downtime, a 3rd good option is to create and use transportable tablespace sets from your already existing rman backups. This process is documented in the Backup and Recovery users guide, Chapter 26 (11gR2 manual).

Since this process uses an existing rman backup you incur no downtime (ie no need to place tablespaces in read-only mode) on your production systems.

Below are the steps to accomplish this (In my example rk01 is the source database and rk02 is the target database).

  • First off, your source database should be running in archivelog mode
  • Take a full database backup from your source database

export ORACLE_SID=rk01
rman target /

run {
allocate channel oem_backup_disk1 type disk format ‘/u01/orarch/rk01/backup/%U’;
backup as BACKUPSET tag ‘%TAG’ database;
backup as BACKUPSET tag ‘%TAG’ archivelog all not backed up;
release channel oem_backup_disk1;
}
run {
allocate channel oem_backup_disk1 type disk format ‘/u01/orarch/rk01/backup/%U’ maxpiecesize 1000 G;
backup as BACKUPSET tag ‘%TAG’ current controlfile;
release channel oem_backup_disk1;
}

exit;

  • Create a transportable tablespace set for the tablespaces you need to transport

export ORACLE_SID=rk01

rman target /

RMAN>  transport tablespace example

2> tablespace destination ‘/u01/orarch/rk01/datafile’

3> auxiliary destination ‘/u01/orarch/rk01/tmp’;

Once the process is complete rman leaves a copy of the datafile (An operating system file, not a backup set file) for the tablespace example , in the directory /u01/orarch/rk01/datafile. It also leaves a export dump file that has the metadata needed for the transport in the same directory /u01/orarch/rk01/datafile.

  • Do endianness conversions on the files, if you need to go cross platform (Use rman convert)
  • Attach the tablespace to your target database

export ORACLE_SID=rk02

sqlplus / as sysdba

create directory tts_dir as ‘/u01/orarch/rk01/datafile’

/

grant all on directory tts_dir to public

/

Exit;

Before you run the next import, make sure that you have created the schema’s (with appropriate privileges) that are in the tablespace you are transporting in the target database rk02, also make sure any roles that are required are created in the target. Eg: The schema HR has objects in the example tablespace. use the create user command to create the HR user with appropriate privileges in the database rk02.

impdp system/manager dumpfile=dmpfile.dmp directory=tts_dir transport_datafiles=/u01/oradata/rk02/example01.dbf logfile=tts_import.log

So As you can see the whole process is executed without shutting down the source database rk01.

Cloud computing definition.

The National Institute of Standards and technology has a good, concise  definition of cloud computing. Sushil kumar of Oracle, was using the same language to define cloud computing in an article the current release of the oracle magazine.

Essential Charachteristics

  • On Demand Self-Service
  • Broad Network Access
  • Resource Pooling
  • Rapid Elasticity
  • Measured Service

Service Models

  • Cloud Software as a Service (SaaS)
  • Cloud Platform as a Service (PaaS)
  • Cloud Infrastructure as a Service (IaaS)

Deployment Models

  • Private Cloud
  • Community Cloud
  • Public Cloud
  • Hybrid Cloud

Grid computing sessions at Oracle Openworld 2009

If you are attending Oracle Openworld 2009, and are interested in learning a lot about oracle Rac and Grid computing, you can find a full list of Oracle Rac and Grid computing events Here (Starting at page 2 of the pdf doc).

In preparation for the event, you could read the following new 11gR2 white papers from Oracle, to understand the latest developments and arm yourself with questions.

Oracle Real Application Clusters 11g Release 2 Technical Overview

Oracle Real Application Clusters 11g Release 2 Overview of SCAN

Oracle Real Application Clusters One Node 11g Release 2 Technical Overview

11gR2 rac installation on 64 bit Linux step by step

Yesterday i completed a 11g Release 2 real application clusters installation on 64 bit Oracle Enterprise Linux 4. The installation process is very similar to the 10g and 11gr1 installations, but much simpler. This was a two node cluster. There are some new concepts that are introduced in 11gR2 real application clusters. Below are some of my notes on 11gr2 new features for Rac and detailed steps that i followed to complete the installation.

Some new concepts in 11gR2 Rac


Oracle clusterware and ASM now are installed into the Same Oracle Home, and is now called the grid infrastructure install.

Raw devices are no longer supported for use for anything (Read oracle cluster registry, voting disk, asm disks), for new installs.

OCR and Voting disk can now be stored in ASM, or a certified cluster file system.

The redundancy level of your ASM diskgroup (That you choose to place voting disk on) determines the number of voting disks you can have.
You can place

  • Only One voting disk on an ASM diskgroup configured as external redundancy
  • Only Three voting disks on an ASM diskgroup configured as normal redundancy
  • Only Five voting disks on an ASM diskgroup configured as high redundancy


The contents of the voting disks are automatically backed up into the OCR

ACFS (Asm cluster file system) is only supported on Oracle Enterprise Linux 5 (And RHEL5), not on OEL4.

There is a new service called cluster time synchronization service that can keep the clocks on all the servers in the cluster synchronized (In case you dont have network time protocol (ntp) configured)

Single Client Access Name (SCAN), is a hostname in the DNS server that will resolve to 3 (or at least one) ip addresses in your public network. This hostname is to be used by client applications to connect to the database (As opposed to the vip hostnames you were using in 10g and 11gr1). SCAN provides location independence to the client connections connecting to the database. SCAN makes node additions and removals transparent to the client application (meaning you dont have to edit your tnsnames.ora entries every time you add or remove a node from the cluster).

Oracle Grid Naming Service (GNS), provides a mechanism to make the allocation and removal of VIP addresses a dynamic process (Using dynamic Ip addresses).

Intelligent Platform Management Interface (IPMI) integration, provides a new mechanism to fence server’s in the cluster, when the server is not responding.

The installer can now check the O/S requirements, report on the requirements that are not met, and give you fixup scripts to fix some of them (like setting kernel parameters).

The installer can also help you setup SSH between the cluster nodes.

There is a new deinstall utility that cleans up a existing or failed install.

And the list goes on an on.

I have broken up the installation process into 3 distinct documents, which can be found below

Installing 11gr2 grid infrastructure

Installing 11gr2 Real Application Clusters

Creating the 11gr2 Clustered database

Rac Starter Kit

Oracle has for a while had a rac assurance team (A team within oracle support, in the HA/Rac support team) that engages proactively with new rac customers. The Rac Assurance team used to provide the new customers with a “starter kit” of documents that  include

  1. A Rac best practices document
  2. A Step by Step installation guide
  3. Recommended patches
  4. A test plan.

If the customer follows these best practices, it sets them up with a solid foundation to be successful with their new  Rac implementation.

Now these test kits are public, you can access them by accessing metlink note 810394.1

You can also log a tar in metalink and ask support for the “Rac Starter Kit”, and they will give you the platform specific starter kit that includes the list of recommended patches.

Rac how to determine interconnect speed

During a Recent  Oracle 11g Rac installation on Solaris, i ran into  the following issue. After installing and configuring oracle clusterware, when we were trying to create the ASM instance, the ASM instance would only stay alive on one node of the cluster. The customer had configured the private interconnect to be a 100 base T connection (As opposed to GiGE). Once the customer re-configured the interconnect to be a GiGE, the ASM instance came up properly. Oracle recommends that you have a GiGE connection for your private interconnect.

Before starting your installation you can check if the interface you are using for the private interconnect, is configured to be a GIGE connection.

On Redhat or Oracle Enterprise Linux

Install the rpm ethtool

ethtool <interfacename> | grep Speed ,will give you the speed of the interface

On Solaris



kstat <interfacename> | grep link_speed  ,will give you the speed of the interface

ORA-00845 Memory_Target Not supported on this system

I was working on testing some 11g streams configurations today. I needed to startup 3 databases instances on the same server. I was using AMM (Automatic memory management). When i was trying to startup the 3rd database, i kept getting the error message “ORA-00845: MEMORY_TARGET not supported on this system”. I also had error messages in the alert log.

This is because, the space allocated for /dev/shm is not sufficient to allocate the SGA+PGA’s for all the 3 database instances (When using the initialization parameter memory_target). The space allocated needs to be >= the total SGA+PGA size of all the 3 instances together.

You can increase the space allocated using the command  “mount -t tmpfs shmfs -o size=2000m /dev/shm” ( I had 3 instances 600mb each SGA+PGA). You can persist this allocation across reboots by adding it to the /etc/fstab.

Determining the network bandwidth required for a dataguard physical standby implementation

References

Network bandwidth Implications of Oracle Dataguard

Dataguard Redo Transport & network configuration

Determine the redo generation rate

- You can query dba_hist_sysstat to find out your redo generation rate in bytes.
- You can choose to use either your average redo generation per hour or peak redo generation per hour (I would recommend that you size for the peak redo generation).
- Let us say that you determined that the redo generation rate, in Bytes, PER DAY, happened to be RedoBytes.
- You have to add a 30% overhead for tcp. So RedoBytesPlusOverhead=RedoBytes*1.3

Convert redo generation rate to Mbps (Megabitspersecond)

- (RedoBytesPlusOverhead*8)/((24*60*60)*(1024*1024))
- This is the theoretical minimum bandwidth that you are going to require.

Other important considerations

- Network latency is a huge factor in the amount of redo you will be able to transport from your primary site to the standby site. This value is unique to your network, so if you have a high latency network you might not be able to sustain the required rate of redo shipping.
- Usually the wide area network between the primary site and standby site is used by more than just dataguard (eg: e-mail etc). So those bandwidth requirements have to be factored in.
- The above two points is why customers should not rely too much on theoretical calculations and to actually deploy dataguard and test the actual redo generation and redo transport performance statistics.
- If you do not deploy a network that can ship redo at a rate of 45 mbps, all that means is that, at times your redo shipping will fall behind (ie your standby site will be behind the primary site) but dataguard still works. In a lot of cases this is acceptable (Based on the customers recovery point objective and recovery time objective).
- There are network tuning best practices outlined in “Dataguard Redo Transport & network configuration” , that you are optimizing the redo transport mechanism and the network. These have to be followed to achieve the best possible network performance.
- There are other techniques like network compression (Hardware compression using wan compression devices, or actual software compression in dataguard 11g) which enable you to reduce the network bandwidth requirements.

Scripts

- You can run the following script to extract redo generation information from dba_hist_sysstat.

set pages 0
set head off
set lines 132
set colsep ~
col curval format 9999999999999999999999
col prevval format 9999999999999999999999
Select
sn.snap_id
,cur_stat.snap_id
,prev_stat.snap_id
,to_char(sn.begin_interval_time,’DD-MON-YY HH24′)
,to_char(sn.end_interval_time,’DD-MON-YY HH24′)
,cur_stat.value curval
,prev_stat.value prevval
,(cur_stat.value-prev_stat.value) RedoGen
from dba_hist_snapshot sn,
(select snap_id,value from dba_hist_sysstat
where stat_name = ‘redo size’) cur_stat
,(select snap_id,value from dba_hist_sysstat where
stat_name = ‘redo size’) prev_stat
Where sn.snap_id = cur_stat.snap_id
and cur_stat.snap_id = prev_stat.snap_id + 1 order by 1;

- Spool the contents into a file RedoInfo.dat

- Create a table in the oracle database named RedoInfo

create table redoinfo (
inid    number,
bdate    date,
edate    date,
totredo    number
);

- Use sql*loader to load the contents of the spool file into redoinfo (At this point some would ask, “why dont i just do a create table as in the same database”, my assumption is that you probably dont want to be creating these temp tables in a production env.).

load data
infile ‘RedoInfo.dat’
append into table RedoInfo
fields terminated by “~” optionally enclosed by ‘”‘
(
field1 filler,
field2 filler,
inid,
bdate  Date “DD-MON-YY HH24″,
edate  Date “DD-MON-YY HH24″,
field3 filler,
field4 filler,
totredo
)

- Then you can run all kinds of queries on this to learn the different charachteristics of your redo generation
- The query below gives you the total redo generation per day and  the Mbps

select to_char(edate,’dd-mon-yy’) Day,sum(totredo)/(1024*1024) TotRedoMb
,(sum(totredo)*1.3)/(1024*1024) RedoPlusOvrHd,((sum(totredo)*1.25)*8)/(1024*1024) Mbits
,round(((sum(totredo)*1.25)*8)/(24*60*60*1024*1024)) Mbps FROM RedoInfo
group by to_char(edate,’dd-mon-yy’)
order by 1;

11g Snapshot Standby

Oracle 11g provides customers the “Snapshot Standby” database feature, so that customers can leverage their investments in standby database systems. When leveraging the “Snapshot Standby” feature, customers can temporarily open up the standby database and run their tests on this system. For example, customer might want to test an application upgrade that alters tables or insert/update/deletes data.

When you convert a physical standby database to a snapshot standby in 11g, the database automatically creates a guaranteed restore point (Obviously “Flashback Database” has to be enabled in order to do this). Once you are done with your testing, the database gets automatically flashed back to this guaranteed restore point and then the log apply starts again.

Point to note with the snapshot standby database is that, in 11g the log transport continues to work. So all the logs are being received on the standby database and gap resolution continues to work (So in case of a primary failure, the logs just need to get applied on the standby). In 10g you could open up the standby database and test on it, but the log transport used to stop.

Let us look at some of the steps involved in setting up and using a “Snapshot standby” database in 11g. For similar functionality with 10g refer to the following documents.

Using 10g Physical standby database for Read/Write Testing and Reporting

10g Snapshot Standby – Dell Case Study

First follow all the normal steps that you would in setting up a physical standby database.
Make sure that you enable “Flashback Database” on both the primary and the standby.
Make sure that log shipping and apply are working properly.
In our example RK01 is the primary database and RK01DG is the standby database.

Check the salary for empno 7934

Update the emp table on the primary database
Switch the logfile to ensure that the change gets propagated and applied on the standby database.

SQL> Update scott.emp set sal=1310 where empno=7934;
SQL> Commit;
SQL> connect / as sysdba
SQL> alter system switch logfile;

Login using sqlplus to the standby database RK01DG

Check the role of this standby database

Use the command “alter database convert to snapshot standby” to try and switch the database to a “Snapshot Standby”

Since the database is in managed recovery mode you get an error.
Cancel the managed recovery
Then convert the database using the “alter database convert to snapshot standby”
This command leaves the standby database in nomount mode.
Shutdown the database and start it back up (In my screen shot i am starting up in mount mode, you can start it all the way up)

Check the role of the standby database, it should say “Snapshot Standby”
Query v$restore_point to see the details of the guaranteed restore point that it created

List the columns in the employee table on the primary database (RK01). This is just for reference, because as part of our testing we are going to add columns to this table.

Login as the scott user to the standby database and run some transactions

Now that testing is over, we can convert this back to the physical standby database.
Issue the “alter database to convert to physical standby” command to convert this from a “Snapshot Standby” to a “Physical standby” database.

Since the database is in open mode, it complains.
Restart the database in mount mode and then issue the same convert command again

Once the command succeeds the database has to be restarted.
Startup the database in mount mode and put the database in managed recovery mode.
Check the role of the standby database. It should now say “physical standby”.

Run transactions on primary to make sure that log transport and apply are working

Put the standby database in read only mode and check if the transactions got applied on the standby

Also make sure that the changes made during testing have been rolled back.

Those are the basic steps to setup and test a “Snapshot Standby” database.

11g Active Dataguard

Oracle introduced the Active Dataguard option in 11g, to allow customers to have read-only access to the physical standby databases. This option gives customers the flexibility to offload resource intensive queries to the physical standby database, while the log shipping and apply continues its work.

In oracle 10g in order to run queries on database tables, in a physical standby database, one had to stop the log apply and open the database in read-only mode. This means that the log files are being queued up during that period and will have to be applied once the database is reverted to the managed recovery mode.

The steps to follow in order to setup Active dataguard are exactly the same as setting up a physical standby database (Creating a Physical Standby). Once the physical standby database is configured and log transport and log apply are working correctly, you can just do an “alter database open” to enable active dataguard.

Let us look at an example.

I have a primary database RK01 and a regualar physical standby database RK01DG.

If you try to query the table emp while the database is in managed recovery mode you will get an error

Make sure that you can update the table emp from the primary database.

In order to enable Active Dataguard, ie enable the ability to query the database while log apply is in progress. Issue the following commands

The main concept to note here is that you are doing a “alter database open” (As opposed to an “alter database open read only”).

Update the table emp again to check if the values you are updating to on the primary are query able on the standby (ie log apply is continuing)

Check the updated value on the standby

Try updating the emp table on the standby database (It should only allow you to query , not update)

The procedure’s to switchover and failover remain exactly the same as before (Irrespective of whether you are in active dataguard mode or not).