Datapump export and import – parallel and compress

It’s been a while since i wrote anything on my blog. Not because I am lazy, but because I’ve been doing a bunch of proof of concepts for various customers. database machine, audit vault, data masking, rac and securefiles etc. Its been loads of fun.

I wanted to write about a a couple of neat things i came across .

Exporting from an oracle database in parallel

Imagine that you have a fairly large database and you want to export the database onto two different devices in parallel (Let us say you have two usb devices attached to the server and you want to leverage the write throughput you get to both simultaneously). You can do this in two steps

  • Define 2 different oracle directories
    • Let us say for eg: the drives you want to use are mounted at /u01/firstusb and /u01/secondusb
    • create directory exp1 as ‘/u01/firstusb’;
    • create directory exp2 as ‘/u01/secondusb’;
  • While exporting use the directories in the dumpfile keyword
    • expdp system/manager directory=exp1 dumpfile=exp1:exp_test_%U.dmp,exp2:exp_test_%U.dmp schemas=AAA,BBB,CCC,DDD parallel=8  logfile=exp.log

So since you are using exp1:exp_test_%U.dmp,exp2:exp_test_%U.dmp  and a parallel=8, datapump creates 4 dump files each on exp1 and exp2 which points to /u01/firstusb and /u01/secondusb respectively.

Importing and Enabling Compression (OLTP or Exadata Hybrid Columnar Compression)

Let us say you want to export from a database that does not have compression turned on, and want to import into one with compression turned ON. Since the table is created with the NOCOMPRESS (defautlt) keyword, the expdp statement actually gathers this info and uses it to create the “create table” statement when it creates the table during the import. So the default is for the imported table also to be NOCOMPRESS.

If you only have a hand full of tables you want to enable compression on, you can pre-create the table (And its indexes and such) using the Compress for Oltp clause and then run the datapump import specifying the parameter table_exists_option=APPEND

If you want to do it for all the tables in a tablespace.

  • Create the tablespace with compression enabled at the tablespace level.
  • Then while importing using datapump specify the transform=SEGMENT_ATTRIBUTES:n:table parameter.

This causes import to ignore the segment attributes for the table while creating it, which will cause the table to inherit the attributes specified at the tablespace level and will be created with OLTP compression enabled.

expdp system/manager directory=exp1 dumpfile=exp1:exp_test_%U.dmp,exp2:exp_test_%U.dmp \ schemas=AAA,BBB,CCC,DDD parallel=8 \
logfile=emcpocexp.log

Grid computing sessions at Oracle Openworld 2009

If you are attending Oracle Openworld 2009, and are interested in learning a lot about oracle Rac and Grid computing, you can find a full list of Oracle Rac and Grid computing events Here (Starting at page 2 of the pdf doc).

In preparation for the event, you could read the following new 11gR2 white papers from Oracle, to understand the latest developments and arm yourself with questions.

Oracle Real Application Clusters 11g Release 2 Technical Overview

Oracle Real Application Clusters 11g Release 2 Overview of SCAN

Oracle Real Application Clusters One Node 11g Release 2 Technical Overview

Creating a view only user in Enterprise Manager grid control

Sometimes you would want to give only database monitoring access to some grid control users. You dont want them to get all other administrative privileges, like shutdown database, create tables, alter tables, drop tables etc. You can create such administrators in enterprise manager grid control by following the steps below.

Whenever you want to monitor a database target, you need to be able to login as a user to that database. Sometimes you might be logging in as SYSTEM or some other user that has DBA privileges. So the first step we need to perform, is to create a user in the target database, that has only limited privileges.

sqlplus system@target

create user oem_view identified by xxx

default tablespace users temporary tablespace temp;

grant create session, oem_monitor to oem_view;

OEM_MONITOR is a role in the database, that has some specific privileges granted to it. If you do not want to grant all those privileges to this user, you can then query the data dictionary to see which privileges are granted to OEM_MONITOR and then decide which subset of that you want to grant to your user OEM_VIEW.

Once the user in the target database is created, you can use enterprise manager grid control to create the new grid control administrator.

Login to enterprise manager grid control as SYSMAN (Or any super administrator)

Setup -> Administrators -> Create

Remove the “Public” role that is listed in the right hand side table

Under Create Administrator: System Privileges select ‘VIEW ANY TARGET’.

Under Create Administrator: Targets, choose all the targets this new admin should be able to view

Click Apply.

Refernce : Metalink Note 377310.1

Login as this new administrator user you created and set oem_view as the username for the database target in preferred credentials.

11gR2 rac installation on 64 bit Linux step by step

Yesterday i completed a 11g Release 2 real application clusters installation on 64 bit Oracle Enterprise Linux 4. The installation process is very similar to the 10g and 11gr1 installations, but much simpler. This was a two node cluster. There are some new concepts that are introduced in 11gR2 real application clusters. Below are some of my notes on 11gr2 new features for Rac and detailed steps that i followed to complete the installation.

Some new concepts in 11gR2 Rac


Oracle clusterware and ASM now are installed into the Same Oracle Home, and is now called the grid infrastructure install.

Raw devices are no longer supported for use for anything (Read oracle cluster registry, voting disk, asm disks), for new installs.

OCR and Voting disk can now be stored in ASM, or a certified cluster file system.

The redundancy level of your ASM diskgroup (That you choose to place voting disk on) determines the number of voting disks you can have.
You can place

  • Only One voting disk on an ASM diskgroup configured as external redundancy
  • Only Three voting disks on an ASM diskgroup configured as normal redundancy
  • Only Five voting disks on an ASM diskgroup configured as high redundancy


The contents of the voting disks are automatically backed up into the OCR

ACFS (Asm cluster file system) is only supported on Oracle Enterprise Linux 5 (And RHEL5), not on OEL4.

There is a new service called cluster time synchronization service that can keep the clocks on all the servers in the cluster synchronized (In case you dont have network time protocol (ntp) configured)

Single Client Access Name (SCAN), is a hostname in the DNS server that will resolve to 3 (or at least one) ip addresses in your public network. This hostname is to be used by client applications to connect to the database (As opposed to the vip hostnames you were using in 10g and 11gr1). SCAN provides location independence to the client connections connecting to the database. SCAN makes node additions and removals transparent to the client application (meaning you dont have to edit your tnsnames.ora entries every time you add or remove a node from the cluster).

Oracle Grid Naming Service (GNS), provides a mechanism to make the allocation and removal of VIP addresses a dynamic process (Using dynamic Ip addresses).

Intelligent Platform Management Interface (IPMI) integration, provides a new mechanism to fence server’s in the cluster, when the server is not responding.

The installer can now check the O/S requirements, report on the requirements that are not met, and give you fixup scripts to fix some of them (like setting kernel parameters).

The installer can also help you setup SSH between the cluster nodes.

There is a new deinstall utility that cleans up a existing or failed install.

And the list goes on an on.

I have broken up the installation process into 3 distinct documents, which can be found below

Installing 11gr2 grid infrastructure

Installing 11gr2 Real Application Clusters

Creating the 11gr2 Clustered database

Upgrade your wordpress software

If you run your blog using wordpress software, then please be aware that there is a wordpress worm going around, that can

This particular worm, like many before it, is clever: it registers a user, uses a security bug (fixed earlier in the year) to allow evaluated code to be executed through the permalink structure, makes itself an admin, then uses JavaScript to hide itself when you look at users page, attempts to clean up after itself, then goes quiet so you never notice while it inserts hidden spam and malware into your old posts.

Holy cow, who thinks up this stuff…

Check if your blog is infected.

Upgrade your blog software to wordpress 2.8.4, which takes care of the vulnarabilities that this worm exploits.

Perl and database resident connection pooling

If you use perl with oracle 11g databases, you should consider using database resident connection pooling to reduce the overheads associated with connecting and disconnecting from oracle. Much has been written about how Php applications benefit by using database resident connection pooling (Because Php does not have a connection pooling mechanism of its own, unlike Java). Similar benefits can be derived by Perl Applications Too.

Mostly perl 5 applications will be using DBI and DBD – Oracle to interact with oracle databases. Since DBD – Oracle uses OCI to communicate with the oracle database, it can benefit by using database resident connection pooling.

When the database is configured for database resident connection pooling, the oracle database creates and maintains a pool of database connections. These connections are then shared by applications connecting to the oracle database. The advantage of this is that the connections are already created, so you do not incur the overhead of establishing a brand new connection to the database. You are just reusing an existing one. This is especially helpful if you have an application that establishes connections and disconnects from the oracle database very rapidly/frequently.

A connection pool can be configured and started in the database as follows

SQL> execute dbms_connection_pool.configure_pool(null,minsize=>2,maxsize=>4);

SQL> execute dbms_connection_pool.start_pool;

A connect string can be configured in the tnsnames.ora to connect to this connection pool using the following syntax

RK01POOL =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = rramdas-us)(PORT = 1521))
(CONNECT_DATA =
(SERVER = POOLED)
(SERVICE_NAME = RK01)
)
)

The perl program can then establish the connection to the database using this connect string in tnsnames.ora

#!/usr/bin/perl

use strict;
use DBI;
my $dbh = DBI->connect( ‘dbi:Oracle:RK01POOL‘,
’scott’,
‘tiger’,
) || die “Database connection not made: $DBI::errstr”;

Thats all it takes, and now you can reap the benefits of using oracle database resident connection pooling with Perl.

You can use dbms_connection_pool.stop_pool to stop the connection pool in the database.

You can use the data dictionary views dba_cpool_info, and dynamic views v$cpool_cc_info, v$cpool_cc_stats, v$cpool_stats to monitor database resident connection pools.

Notes on Oracle Parallel Query – Part 0

Oracle introduced “Parallel Query” in version 7.1 in 1994. A decade and a half later, it still trips some unsuspecting customers up, when they

  • Turn parallelism on via a hint in all queries

Or

  • Turn parallelism on at every table and index level

Doing this, coupled with high concurrent usage of the database, can lead to a lot of query server processes being spawned, leading to very high cpu usage, high load average, high number of processes waiting on the cpu run queue and high amount of i/o requests. Such situations may result in overall degraded performance. Below are some concepts to note when using parallel query. Sometimes Too much of a good thing is indeed a Bad thing.

Parallel query (and/or parallel DML) is a mechanism to levarage the cpu and i/o resources available on an SMP system to break down a database operation (query, insert , update, delete,merge) into multiple small operations, execute the small operations concurrently on the system, completing the database operation significantly faster than if it would have executed in a serial fashion. Database operations that include large table scans, large index scans or large joins usually benefit highly from using parallel query.

Oracle does not turn this feature on by default for all queries. There are specific steps to perform to turn this on as discussed in my previous post.

The total number of parallel query server’s that can be running at any given time is determined by the initialization parameter parallel_max_servers. This parameter defaults to cpu_count*parallel_threads_per_cpu*(2 if pga_aggregate_target > 0, else 1)*5

Oracle’s recommendation is that you set parallel_max_servers to 2 * Degree of Parallelism * Number of concurrent users

The reason for the recommendation 2 * DOP is that, if the operation can use “Inter Operation” parallelism, it could end up using 2 sets of Query Server Processes at the same time.

“Intra Operation” parallelism means that one operation is executed in parallel using multiple Query Server Processes (eg: Full table scan). “Inter Operation” parallelism means that the output of one parallel operation is being pipelined to another parallel operation (eg: When there are two tables being full table scanned followed by a hash join, so here the first set of query servers could be scanning one table and the second set consuming those rows and building the hash table). So there are circumstances where the operation ends up using two sets of Query Server Processes. Hence the recommendation of 2*DOP.

You ideally want to keep the CPU usage under 70% with your load averages equal to your number of cpu’s and average run queues as close to 0 as possible. The idea is to work the server efficiently but not work it to death.

Keep in mind that on systems like sun’s t2000 and t5240 systems, each thread shows up as a cpu in cpu_count .

You can control the use of parallel query server’s by setting the initialization parameter parallel_adaptive_multiuser=true (It defaults to true). When enabled this parameter causes oracle to enable an adaptive algorithm that reduces the degree of parallelism used by queries based on the database load.

One another common mistake is that for statements like insert /*+ append */ ….select…..  users forget to enable parallel dml. When running normally the select part of the insert gets parallelized (as long as dop is set) but the insert runs in a serial fashion. Once you enable parallel dml even the insert is parallelized.

You have to explicitly issue the command ‘alter session enable parallel dml’ to enable parallel dml. You should avoid turning parallel dml ON  for all sessions and all transactions. Turn it selectively on for the sessions you need parallel dml. There are some restrictions for using parallel dml.

Consider increasing parallel_execution_message size to 4k or 8k from the default of 2k, for improved parallel execution performance.

Determining the best degree of parallelism for tables is a tough problem. You need to consider the following factors.

  • What is the required query response time by the end user
  • How many parallel query servers does it take to achieve that query response time
  • Haw many of such large queries will concurrently execute on my server (Consider your application generated queries and ad hoc queries)
  • How many CPU’s do i have on the system.
  • How much i/o volumes can the system handle

Once you have the above information you need to determine a degree of parallelism that gives a reasonable query response time, when your average number of large queries execute concurrently, without clobbering your CPU, memory and i/o subsystems (Alternatively you could just determine the number of cpu’s, amount of memory, i/o channels and physical disks you need to sustain the above workload and go buy and use that hardware :-) ).

Also see my previous post on enabling and monitoring parallel query for more parallel query info.

Shell script to generate awr reports for a range of snapshots

We have been doing some performance testing in the recent days. We were running some workload using an application, capturing awr snapshots every 15 minutes on a 8 node rac cluster. I needed to generate the awr reports between each of the snapshots for each of the nodes. Running awrrpt.sql for each 15 minute intervals for each one of the 8 nodes would have been a tedious task. So i wrote a bash shell script that generates those awr reports.

It takes the database id, instance number, start snapshot id and end snapshot id as arguments (you can query dba_hist_snapshot to find out the start and end snapshot id’s you want to use). The directory in which it generates the reports is hardcoded in the script. The script has to be run as the oracle user  and it logs in as sys to generate the reports (You can modify as you need if you want it to be run as a different user). I have tested it only on Linux.

You can run it as ./genawrs.sh dbid  instancenumber beginsnap endsnap, to generate the reports.  This means that you have to run it once for each instance you need the report for. The script can be found below.

#!/bin/bash

if [ $# != 4 ]
then
echo “Syntax  genawrs.sh dbid instanceId startsnapid endsnapid”
exit 1
fi

l_dbid=$1
l_instid=$2
l_start_snapid=$3
let l_end_snapid=$4-1

# For all snapshot id’s
# Set the next snapshot id as current snapshot id + 1
# Spool a log file
# Log into sqlplus and call dbms_workload_repository.awr_report_text
# To generate the awr

for i in `seq $l_start_snapid $l_end_snapid`
do
let l_next_snapid=$i+1;
l_awr_log_file=”/awrs/out/awrrpt_${2}_${i}_${l_next_snapid}.log”

sqlplus -s / as sysdba << EOC
set head off
set pages 0
set lines 132
set echo off
set feedback off
spool $l_awr_log_file
SELECT
output
FROM
TABLE
(dbms_workload_repository.awr_report_text
($l_dbid,$l_instid,$i,$l_next_snapid)
);
spool off
EOC
done

Rac Starter Kit

Oracle has for a while had a rac assurance team (A team within oracle support, in the HA/Rac support team) that engages proactively with new rac customers. The Rac Assurance team used to provide the new customers with a “starter kit” of documents that  include

  1. A Rac best practices document
  2. A Step by Step installation guide
  3. Recommended patches
  4. A test plan.

If the customer follows these best practices, it sets them up with a solid foundation to be successful with their new  Rac implementation.

Now these test kits are public, you can access them by accessing metlink note 810394.1

You can also log a tar in metalink and ask support for the “Rac Starter Kit”, and they will give you the platform specific starter kit that includes the list of recommended patches.

ILM Cost savings calculator

This is a simple calculator (Written using Javascript), that can be used to estimate the potential cost savings (As a result of reducing your total storage costs)  of using a Information Lifecycle Management (ILM) Strategy to store and maintain your oracle data.
When architecting your ILM strategy you may decide to use the following different types of storage

  • High performance tier where all the important and frequently accessed data is stored (Smaller faster disks).
  • Low cost storage tier where the less frequently accessed data is stored (Larger slower, ATA disks).
  • Online archive storage tier where all the data that is hardly or never accessed is stored (Low cost ATA disks).

Once you decide your classifications you could partition your tables accordingly (And then place different tablespaces on different type of storage) and then use Oracle ILM Assistant to maintain the tables and their data.

This calculator can be used to estimate the cost savings you might get by implementing such a strategy. (Remember this is only a very rough first estimate. Use values that you have received from your hardware vendor to use as inputs for cost per Gb of different tiers of storage).

Click Here to Access the ILM Calculator.

All comments and suggestions, for corrections and improvements  are welcome.