What makes the oracle database machine V2 incredibly awesome ?

Lots of hardware power for the database servers

– 8 Real Application Clusters, Database server nodes (aka compute nodes) (Sunfire X4170’s)
– 2 quad core, Intel Xeon, E5540 processors (2.53ghz) in each server (Total 64 cores across 8 nodes)
– 72gb of RAM on each node (Total 576gb of Ram on the database servers)
– 3, 36 port QDR Infiniband switches (40gbit infiniband)

Lots of hardware power for the storage servers

– 14 Exadata cells, (Sunfire X4275’s).
– Each cell has
– 2 quad core Intel Xeon, E5540 processors on each cell (Total 112 cpu cores on all 14 cells together)
– 24Gb RAM on each cell (Total 336gb of Ram on all 14 cells together)
– 384Gb of Flash Cache (PCI-E flash card) on each cell (5Tb on all 14 cells together)
– 12x600gb (SAS) disks (7.2Tb) or 12x1Tb (SATA) disks (12Tb) (Total 100Tb with SAS disks, or 168Tb with SATA disks on all 14 cells together)
The above hardware gives the database machine the ability to read data at the rate of 21GigaBytes a second or 1,000,000 Io’s per second.

Balanced Configuration

Each disk drive (SAS) in the Sunfire X4275 server are, 3.5 inch, 15k Rpm, SAS 2.0, 600Gb drives. Each drive has an average read capacity of atleast 125MegaBytes per second for sequential scans. 168 disk drives, can together scan at the rate off (And return) 21,000 MegaBytes per second.
The Infiniband connections between the storage cells and the compute nodes have enough network bandwidth to transport data at the rate of 21GigaBytes per second.
The 64 cpu cores can issue i/o requests at the approx rate of 300megabytes per core, hence requesting about 21GigaBytes of data per second.
So the system is architect ed to optimally read (enough cpu’s to request the i/o, network bandwidth to ship the i/o and enough disk capacity to service that i/o requests). This is why the oracle database machine is a well balanced system.
At the read rate of 21GigaBytes a second, 1Terabyte of data can be read by the database servers in less than 1 minute.

Infiniband

Each database Node and Exadata Cell, has dual port, Quad data rate (QDR) Infiniband connectivity.
Infiniband is used for the database node to exadata cell connectivity and also for the Rac high speed interconnect (Cache fusion network)
Infiniband has the flexibility of a LAN network with the Speed of a SAN.
Oracle’s interconnect protocol uses DMA, to move data from the wire directly to the Memory without any additional copies made.

Exadata Cells

The exadata cells provide a highly redundant, high performance hardware with very intelligent software to efficiently process database i/o requests.
The hardware capability was discussed in the section “Lots of hardware power for the storage servers”.
The Intelligence in the Exadata cells include “Smart flash cache”, “Smart Scan’s”, “Storage Indexes” and “Exadata column compression”.
Oracle Automatic Storage Management (ASM) ensures that all oracle database files are evenly spread across all the 168 disks available in the database machine.
Oracle Database uses  the protocol iDB (Built by oracle, aptly called the Intelligent Database protocol) to communicate with the exadata cells. iDB is built on the Zero Data Loss, Zero Copy implementation (ZDP) of the industry standard protocol RDSv3 (Reliable datagram socket)

Flash cache

Random read operations are cached on the 5Tb of flash cache available in the database machine, significantly improving oltp performance.
The exadata smart flash cache, working with the database server, keeps track of data access patterns and intelligently manages the caching of blocks from the oracle datafiles.

Smart Scans

The oracle database server uses the iDB protocol to push query predicates (That limit data retrieved by the query using filters and join conditions) to the exadata cell.
This enables the cell to do three things
– Identify rows that are needed by the query and ship only those rows back to the database server (Not entire blocks)
– Identify the columns needed by the query and ship only the required columns in the rows back to the database server
– Use Bloom filters and process join conditions and ship only matching rows in queries with joins back to the database server
This drastically reduces the amount of data send back to the database server (Reducing the network usage)
Transferring file blocks from disks inside a cell to the physical memory of the cell can happen relatively fast. If a lot of the data that is not needed by the database server, can be eliminated at the cell, then the amount of data that needs to go over a network to the database server is significantly reduced. So the network i/o usage between the database servers and exadata cells is reduced by smart scans.

Storage Indexes

Smart scans however do not reduce the disk i/o within the cell (ie transfer from disk to cell physical memory).
Oracle creates an in memory array of structures, that keep track off min and max values of columns (columns used in the where clause that benefit by storage indexes), that let oracle identify if specific 1mb regions are needed based on the filtering conditions applied to the tables.
So storage indexes is a filter oracle applies to prune away 1mb chunks that do not have to be read.
This reduces the i/o within the exadata cell.

Exadata Hybrid Columnar Compression

Traditional relational databases store data in database blocks in a “Row” format. This type of storage limits the amount of compression that can be achieved
Column store databases, organizes and stores data by column. Storing column data together, with the same datatype and similar charachteristics , results in significant compression that can be achieved. However if the query refers to more than a couple of columns in the table, or does more than modest updates and inserts , those queries and dml tend to have slower performance.
Exadata hybrid columnar compression takes a blended approach. Oracle takes rows that fit into multiple blocks (Called a compression unit), converts the rows into columns and stores the data in a columnar format within a compression unit.
Oracle uses 3 different compression format’s and different transformations, depending on the compression level you have chosen
There are 4 levels of compression (Each level is a tradeoff between compression ratio and compression speeds)
– Query Low
– Query High (Default when you say compress for query)
– Archive Low
– Archive High
With Exadata Hybrid columnar compression
1) The amount of storage required to store massive amounts of data could be potentially decreased by a factor of 10.
2) The amount of i/o to be issued (For queries that scan very very large tables) is significantly reduced.
The above features together make’s the Oracle Database Machine, rock database performance.

Tracing oracle parallel query sessions and creating a tkprof output

When running queries in oracle with oracle parallel query, oracle spins up multiple parallel query processes to process the query. Each parallel query process gets its own database session. So when we turn trace on the session oracle creates multiple trace files in the udump directory. Here are the steps that i went through to gather a tkprof output of all those files for a query (Or for anything you run in the same session).
Login to sqlplus from where you are going to run your parallel query.
Setup a client id for the session
exec dbms_session.set_identifier(‘px_test’);
alter session set events=’10046 trace name context forever,level 1′;
Run your sql query (That uses parallel query)
Quit from sqlplus
Find all your trace files and move it to a different directory
Identify your user_dump_directory
sqlplus / as sysdba
SQL> sho parameter user_dump_directory
Locate all your trace files (There will be one for the main session and then 1 each for all the parallel query processes used)
cd /u01/udump
mkdir tmp2
find . -name ‘*.trc’ -mmin 5
The find command above finds and lists all the trace files that have been updated in the last 5 minutes (Change mmin to your time period within which you want to list trace files for).
move the files created by your session to the subdirectory named tmp2
Now cd tmp2
Remove all the files in tmp2 which are not the sessions trace or one of the parallel query slaves. This simplifies the trcsess command you need to run. Or else you can list all of your trace files by name in your trcsess command.
trcsess output=prog9.trc clientid=px_test *.trc
tkprof prog9.trc prog9.out sort=exeela sys=no

11gr2 new awr reports for Real Application Clusters

There are two new awr reports in 11gr2, which will be helpful to dba’s in Real Application Clusters Environments (RAC).

awrgrpt.sql

This is a cluster wide awr report, so you can see a lot of the information from all the nodes in the same section, and you can also see aggregated statistics from all the instances at the same time (You can see totals, averages and standard deviations).

awrgdrpt.sql

This is a cluster wide stats diff report (like you had awrddrpt.sql in 11gr1), comparing the stats differences between two different snapshot intervals, across all nodes in the cluster.

These are huge additions to the awr reports, that enable understanding the database performance in real application clusters environments.

Cloud computing definition.

The National Institute of Standards and technology has a good, concise  definition of cloud computing. Sushil kumar of Oracle, was using the same language to define cloud computing in an article the current release of the oracle magazine.

Essential Charachteristics

  • On Demand Self-Service
  • Broad Network Access
  • Resource Pooling
  • Rapid Elasticity
  • Measured Service

Service Models

  • Cloud Software as a Service (SaaS)
  • Cloud Platform as a Service (PaaS)
  • Cloud Infrastructure as a Service (IaaS)

Deployment Models

  • Private Cloud
  • Community Cloud
  • Public Cloud
  • Hybrid Cloud

Datapump export and import – parallel and compress

It’s been a while since i wrote anything on my blog. Not because I am lazy, but because I’ve been doing a bunch of proof of concepts for various customers. database machine, audit vault, data masking, rac and securefiles etc. Its been loads of fun.

I wanted to write about a a couple of neat things i came across .

Exporting from an oracle database in parallel

Imagine that you have a fairly large database and you want to export the database onto two different devices in parallel (Let us say you have two usb devices attached to the server and you want to leverage the write throughput you get to both simultaneously). You can do this in two steps

  • Define 2 different oracle directories
    • Let us say for eg: the drives you want to use are mounted at /u01/firstusb and /u01/secondusb
    • create directory exp1 as ‘/u01/firstusb’;
    • create directory exp2 as ‘/u01/secondusb’;
  • While exporting use the directories in the dumpfile keyword
    • expdp system/manager directory=exp1 dumpfile=exp1:exp_test_%U.dmp,exp2:exp_test_%U.dmp schemas=AAA,BBB,CCC,DDD parallel=8  logfile=exp.log

So since you are using exp1:exp_test_%U.dmp,exp2:exp_test_%U.dmp  and a parallel=8, datapump creates 4 dump files each on exp1 and exp2 which points to /u01/firstusb and /u01/secondusb respectively.

Importing and Enabling Compression (OLTP or Exadata Hybrid Columnar Compression)

Let us say you want to export from a database that does not have compression turned on, and want to import into one with compression turned ON. Since the table is created with the NOCOMPRESS (defautlt) keyword, the expdp statement actually gathers this info and uses it to create the “create table” statement when it creates the table during the import. So the default is for the imported table also to be NOCOMPRESS.

If you only have a hand full of tables you want to enable compression on, you can pre-create the table (And its indexes and such) using the Compress for Oltp clause and then run the datapump import specifying the parameter table_exists_option=APPEND

If you want to do it for all the tables in a tablespace.

  • Create the tablespace with compression enabled at the tablespace level.
  • Then while importing using datapump specify the transform=SEGMENT_ATTRIBUTES:n:table parameter.

This causes import to ignore the segment attributes for the table while creating it, which will cause the table to inherit the attributes specified at the tablespace level and will be created with OLTP compression enabled.

expdp system/manager directory=exp1 dumpfile=exp1:exp_test_%U.dmp,exp2:exp_test_%U.dmp \ schemas=AAA,BBB,CCC,DDD parallel=8 \
logfile=emcpocexp.log

Grid computing sessions at Oracle Openworld 2009

If you are attending Oracle Openworld 2009, and are interested in learning a lot about oracle Rac and Grid computing, you can find a full list of Oracle Rac and Grid computing events Here (Starting at page 2 of the pdf doc).

In preparation for the event, you could read the following new 11gR2 white papers from Oracle, to understand the latest developments and arm yourself with questions.

Oracle Real Application Clusters 11g Release 2 Technical Overview

Oracle Real Application Clusters 11g Release 2 Overview of SCAN

Oracle Real Application Clusters One Node 11g Release 2 Technical Overview

Creating a view only user in Enterprise Manager grid control

Sometimes you would want to give only database monitoring access to some grid control users. You dont want them to get all other administrative privileges, like shutdown database, create tables, alter tables, drop tables etc. You can create such administrators in enterprise manager grid control by following the steps below.

Whenever you want to monitor a database target, you need to be able to login as a user to that database. Sometimes you might be logging in as SYSTEM or some other user that has DBA privileges. So the first step we need to perform, is to create a user in the target database, that has only limited privileges.

sqlplus system@target

create user oem_view identified by xxx

default tablespace users temporary tablespace temp;

grant create session, oem_monitor to oem_view;

OEM_MONITOR is a role in the database, that has some specific privileges granted to it. If you do not want to grant all those privileges to this user, you can then query the data dictionary to see which privileges are granted to OEM_MONITOR and then decide which subset of that you want to grant to your user OEM_VIEW.

Once the user in the target database is created, you can use enterprise manager grid control to create the new grid control administrator.

Login to enterprise manager grid control as SYSMAN (Or any super administrator)

Setup -> Administrators -> Create

Remove the “Public” role that is listed in the right hand side table

Under Create Administrator: System Privileges select ‘VIEW ANY TARGET’.

Under Create Administrator: Targets, choose all the targets this new admin should be able to view

Click Apply.

Refernce : Metalink Note 377310.1

Login as this new administrator user you created and set oem_view as the username for the database target in preferred credentials.

11gR2 rac installation on 64 bit Linux step by step

Yesterday i completed a 11g Release 2 real application clusters installation on 64 bit Oracle Enterprise Linux 4. The installation process is very similar to the 10g and 11gr1 installations, but much simpler. This was a two node cluster. There are some new concepts that are introduced in 11gR2 real application clusters. Below are some of my notes on 11gr2 new features for Rac and detailed steps that i followed to complete the installation.

Some new concepts in 11gR2 Rac


Oracle clusterware and ASM now are installed into the Same Oracle Home, and is now called the grid infrastructure install.

Raw devices are no longer supported for use for anything (Read oracle cluster registry, voting disk, asm disks), for new installs.

OCR and Voting disk can now be stored in ASM, or a certified cluster file system.

The redundancy level of your ASM diskgroup (That you choose to place voting disk on) determines the number of voting disks you can have.
You can place

  • Only One voting disk on an ASM diskgroup configured as external redundancy
  • Only Three voting disks on an ASM diskgroup configured as normal redundancy
  • Only Five voting disks on an ASM diskgroup configured as high redundancy


The contents of the voting disks are automatically backed up into the OCR

ACFS (Asm cluster file system) is only supported on Oracle Enterprise Linux 5 (And RHEL5), not on OEL4.

There is a new service called cluster time synchronization service that can keep the clocks on all the servers in the cluster synchronized (In case you dont have network time protocol (ntp) configured)

Single Client Access Name (SCAN), is a hostname in the DNS server that will resolve to 3 (or at least one) ip addresses in your public network. This hostname is to be used by client applications to connect to the database (As opposed to the vip hostnames you were using in 10g and 11gr1). SCAN provides location independence to the client connections connecting to the database. SCAN makes node additions and removals transparent to the client application (meaning you dont have to edit your tnsnames.ora entries every time you add or remove a node from the cluster).

Oracle Grid Naming Service (GNS), provides a mechanism to make the allocation and removal of VIP addresses a dynamic process (Using dynamic Ip addresses).

Intelligent Platform Management Interface (IPMI) integration, provides a new mechanism to fence server’s in the cluster, when the server is not responding.

The installer can now check the O/S requirements, report on the requirements that are not met, and give you fixup scripts to fix some of them (like setting kernel parameters).

The installer can also help you setup SSH between the cluster nodes.

There is a new deinstall utility that cleans up a existing or failed install.

And the list goes on an on.

I have broken up the installation process into 3 distinct documents, which can be found below

Installing 11gr2 grid infrastructure

Installing 11gr2 Real Application Clusters

Creating the 11gr2 Clustered database

Upgrade your wordpress software

If you run your blog using wordpress software, then please be aware that there is a wordpress worm going around, that can

This particular worm, like many before it, is clever: it registers a user, uses a security bug (fixed earlier in the year) to allow evaluated code to be executed through the permalink structure, makes itself an admin, then uses JavaScript to hide itself when you look at users page, attempts to clean up after itself, then goes quiet so you never notice while it inserts hidden spam and malware into your old posts.

Holy cow, who thinks up this stuff…

Check if your blog is infected.

Upgrade your blog software to wordpress 2.8.4, which takes care of the vulnarabilities that this worm exploits.

Perl and database resident connection pooling

If you use perl with oracle 11g databases, you should consider using database resident connection pooling to reduce the overheads associated with connecting and disconnecting from oracle. Much has been written about how Php applications benefit by using database resident connection pooling (Because Php does not have a connection pooling mechanism of its own, unlike Java). Similar benefits can be derived by Perl Applications Too.

Mostly perl 5 applications will be using DBI and DBD – Oracle to interact with oracle databases. Since DBD – Oracle uses OCI to communicate with the oracle database, it can benefit by using database resident connection pooling.

When the database is configured for database resident connection pooling, the oracle database creates and maintains a pool of database connections. These connections are then shared by applications connecting to the oracle database. The advantage of this is that the connections are already created, so you do not incur the overhead of establishing a brand new connection to the database. You are just reusing an existing one. This is especially helpful if you have an application that establishes connections and disconnects from the oracle database very rapidly/frequently.

A connection pool can be configured and started in the database as follows

SQL> execute dbms_connection_pool.configure_pool(null,minsize=>2,maxsize=>4);

SQL> execute dbms_connection_pool.start_pool;

A connect string can be configured in the tnsnames.ora to connect to this connection pool using the following syntax

RK01POOL =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = rramdas-us)(PORT = 1521))
(CONNECT_DATA =
(SERVER = POOLED)
(SERVICE_NAME = RK01)
)
)

The perl program can then establish the connection to the database using this connect string in tnsnames.ora

#!/usr/bin/perl

use strict;
use DBI;
my $dbh = DBI->connect( ‘dbi:Oracle:RK01POOL‘,
‘scott’,
‘tiger’,
) || die “Database connection not made: $DBI::errstr”;

Thats all it takes, and now you can reap the benefits of using oracle database resident connection pooling with Perl.

You can use dbms_connection_pool.stop_pool to stop the connection pool in the database.

You can use the data dictionary views dba_cpool_info, and dynamic views v$cpool_cc_info, v$cpool_cc_stats, v$cpool_stats to monitor database resident connection pools.