What’s new in Oracle Linux oow2011 a summary

There were some significant new features of Oracle Linux, Announced during oracle openworld 2011. Some of them were in a single slide of a presentation done by Edward Screven, who is the chief corporate Architect at Oracle. Easy to have missed. So here are some of the details of the new features.

DTrace is now available for Oracle Linux 6

DTrace has been a favorite of Solaris users for ages. Now oracle brings the coolness and functionality of DTrace to Linux.  It provides very granular level information about cpu, memory, filesystem and network usage at each process level, in addition to system calls, arguments used etc. It is kind of sort of like strace on linux, but has very rich amount of details, with very little instrumentation overhead.

Wim Coekaerts , has written a detailed article on how to enable this functionality.

Kernel upgrades can now be done while the system is running (Ksplice available since beginning of Sep 2011)

This one was really not an openworld announcement (But was touched upon, in Edward Screven’s presentation) . It has been available since early september 2011. Oracle acquired a company named Ksplice in July 2011. Now Oracle Linux users can perform kernel updates without having to stop the system, or having to reboot the system.

Wim Coekaerts , has written a detailed article on this new functionality and the article has a link to a white paper on how to enable this functionality.

Unbreakable Enterprise Kernel version 2 now available with major new features

The Unbreakable Enterprise Kernel version 2 was released and brings with it lot of new features. Couple of the highlights are below

 Btrfs is now Production

Btrfs stands for Better file system and/or B-Tree file system. Although existing ext3 and ext4 file systems are great, it’s kind of old technology and lacks key enterprise features. Btrfs brings new features like ability to snapshot, online defragmentation, volume growth and shrinking, checksum’s of data and metadata etc.

 Linux Containers

Solaris has had zones and containers for a while that enables virtualization of physical Sun servers that run Solaris. Now oracle is bringing this functionality to Linux. The key difference between Linux Containers and Linux Virtual Machines (Like Oracle Virtual Machine) is that, Linux Containers can run instructions native to the core cpu without any interpretation mechanisms and hence provides good performance for the virtualized hosts.

There are also numerous enhancements to improve performance of oracle products on Oracle Linux, in this new release of the kernel.

Wim Coekaerts, has posted a good article on how to get started with using the Unbreakable kernel version 2.

 Simple example to enable linux containers .

 

Auto DOP and calibrate_io

I was posed the following question by a colleague recently. I am using auto DOP in 11gr2 and I am setting parallel_degree_threshold=12. But when i execute my query, my query is still running with  a parallelism of 48 (ie it seems to be ignoring my setting for parallel_degree_threshold). The problem turned out  to be that for Auto DOP to work,  you need to have dbms_workload_manager.calibrate_io run and the table resource_io_calibrate$ populated.

This requirement is explicitly stated in the oracle 11gr2 documentation at http://download.oracle.com/docs/cd/E11882_01/server.112/e25523/parallel002.htm#CIHEFJGC

 

“When PARALLEL_DEGREE_POLICY is set to AUTO, Oracle Database determines whether the statement should run in parallel based on the cost of the operations in the execution plan and the hardware characteristics. The hardware characteristics include I/O calibration statistics so these statistics must be gathered otherwise Oracle Database does not use the automatic degree policy feature.”

Automatic DOP, where oracle figures out the degree of parallelism to use for a given query, gets turned on by setting the initialization parameter PARALLEL_DEGREE_POLICY to either AUTO or LIMITED (When it is limited it does NOT turn on parallel query queuing and in memory parallel query). So the requirement to gather calibrate_io output is applicable to both the setting AUTO and LIMITED.

When you are using Auto Dop, and you want to limit the maximum parallelism that a query can get, you can use the parameter parallel_degree_limit=cpu/io/<integer>, to limit the DOP of a query.

When you create a new database using a DBCA (Database configuration assistant) template, in the new database, the calibrate_io statistics is not present. You can verify this by running  the queries below.

SQL> select * from resource_io_calibrate$;

no rows selected
SQL> select * from v$io_calibration_status;

STATUS
-------------
CALIBRATION_TIME
---------------------------------------------------------------------------
NOT AVAILABLE

Below is an example.

  • I have created a new table named sales_rk that has 58 million rows in it.
  • There are no indexes, or primary keys on this table.
  • The table is decorated with parallel degree DEFAULT
  • parallel_degree_policy=LIMITED
  • parallel_degree_limit=2
  • cpu_count=2
  • parallel_threads_per_cpu=2
  • At this time there are no rows in resource_io_calibrate$

I ran the query “select count(*) from sales_rk”

I would have expected auto dop to have kicked in and parallel_degree_limit to have limited the DOP of the query to 2.

However that is not what happened.

The query ran with a DOP of 4.

So oracle just picked the default DOP (cpu_count x parallel_threads_per_cpu).

Further evidence that Auto DOP did not kick in can be found by examining the dbms_xplan output of the statement.

 

SQL> select count(*) from sales_rk;

  COUNT(*)
----------
  58805952

SQL> select * from table(dbms_xplan.display_cursor(null,null,'TYPICAL'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID	1bzaqj7scjp7p, child number 0
-------------------------------------
select count(*) from sales_rk

Plan hash value: 2302347944

--------------------------------------------------------------------------------------------------------
| Id  | Operation	       | Name	  | Rows  | Cost (%CPU)| Time	  |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |	  |	  | 21544 (100)|	  |	   |	  |	       |
|   1 |  SORT AGGREGATE        |	  |	1 |	       |	  |	   |	  |	       |
|   2 |   PX COORDINATOR       |	  |	  |	       |	  |	   |	  |	       |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |	1 |	       |	  |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |	  |	1 |	       |	  |  Q1,00 | PCWP |	       |
|   5 |      PX BLOCK ITERATOR |	  |    58M| 21544   (1)| 00:04:19 |  Q1,00 | PCWC |	       |
|*  6 |       TABLE ACCESS FULL| SALES_RK |    58M| 21544   (1)| 00:04:19 |  Q1,00 | PCWP |	       |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access(:Z>=:Z AND :Z)

Note
-----
   - automatic DOP: skipped because of IO calibrate statistics are missing

The Note section explicitly states that Auto DOP was skipped.

Now to fix the situation, i ran dbms_resource_manager.calibrate_io.

DECLARE
  lat  INTEGER;
  iops INTEGER;
  mbps INTEGER;
BEGIN
  DBMS_RESOURCE_MANAGER.CALIBRATE_IO (1, 10, iops, mbps, lat);
end;
/

 

Beware of calibrate_io generating bad numbers for io mbps. If it does, then follow instructions in MOS note 1269321.1 to delete the contents of resource_io_calibrate$ and populate the table manually.

Bounce the database.

Check whether calibrate_io worked

SQL> l
  1* select * from v$io_calibration_status
SQL> /

STATUS
-------------
CALIBRATION_TIME
---------------------------------------------------------------------------
READY
07-OCT-11 05.56.40.911 PM

 

Now re-run the same query, Auto DOP kicks in, and it executes with a DOP of 2.

Looking at the execution plan, confirms that Auto DOP did kick in.

 

SQL> select * from table(dbms_xplan.display_cursor(null,null,'TYPICAL'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
SQL_ID	1bzaqj7scjp7p, child number 1
-------------------------------------
select count(*) from sales_rk

Plan hash value: 2302347944

--------------------------------------------------------------------------------------------------------
| Id  | Operation	       | Name	  | Rows  | Cost (%CPU)| Time	  |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |	  |	  | 43087 (100)|	  |	   |	  |	       |
|   1 |  SORT AGGREGATE        |	  |	1 |	       |	  |	   |	  |	       |
|   2 |   PX COORDINATOR       |	  |	  |	       |	  |	   |	  |	       |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |	1 |	       |	  |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |	  |	1 |	       |	  |  Q1,00 | PCWP |	       |
|   5 |      PX BLOCK ITERATOR |	  |    58M| 43087   (1)| 00:00:08 |  Q1,00 | PCWC |	       |
|*  6 |       TABLE ACCESS FULL| SALES_RK |    58M| 43087   (1)| 00:00:08 |  Q1,00 | PCWP |	       |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access(:Z>=:Z AND :Z)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 2 because of degree limit

Shell script to generate awr diff reports

Awr diff reports (awrddrpt.sql) are a great tool for comparing database performance characteristics from different time periods. Lot of times, the details in the report can point you to differences in workload profiles, or sql execution differences, helping you to narrow down problem areas.

Awr diff report, can compare the awr data for different time periods

– Period A v/s Period B for the same instance in the same database

– Period A in instance X v/s Period B in instance Y in the same database

– Period A in instance X in database P v/s Period B in instance Y in database Q

I routinely export awr data from different databases (awrextr.sql) and import them to a central database (awrload.sql). However one problem with this is that once it is in the central database, in 11gr2 there is no way of viewing this data via Enterprise Manager. So i end up writing sql statements to compare the data.

Below is a handy shell script that helps you compare two different time periods.

You can run the script using the syntax below.

./genawrd.sh dbid1 inst1 startsnapid1 endsnapid1 dbid2 inst2 startsnapid2 endsnapid2 credentials

eg:
./genawrd.sh 3401191500 3 1111 1112 1346415800 2 757 758 system/manager@rk01
#!/bin/bash

if [ $# != 9 ]
then
echo Syntax  genawrd.sh dbid inst1 startsnapid1 endsnapid1 dbid2 inst2 startsnapid2 endsnapid2 credentials
exit 1
fi

l_dbid=$1
l_instid=$2
l_start_snapid=$3
l_end_snapid=$4

l_dbid2=$5
l_instid2=$6
l_start_snapid2=$7
l_end_snapid2=$8
l_credentials=$9

l_logdir='/u01/Temp/awrs'

l_awr_log_file=${l_logdir}/awrdrpt_${l_instid}_${l_instid2}_${l_start_snapid}_${l_end_snapid}_${l_start_snapid2}_${l_end_snapid2}.html

sqlplus -s $l_credentials << EOC  > /dev/null
set head off
set pages 0
set lines 132
set echo off
set feedback off
set termout off
define inst_num=${l_instid};
define inst_num2=${l_instid2};
define dbid=${l_dbid};
define dbid2=${l_dbid2};
define report_type='html';
define report_name=${l_awr_log_file};
define begin_snap=${l_start_snapid};
define end_snap=${l_end_snapid};
define begin_snap2=${l_start_snapid2};
define end_snap2=${l_end_snapid2};
@$ORACLE_HOME/rdbms/admin/awrddrpi
EOC

Shell script to generate ash report

From time to time i get questions like “Batch program A was running in X minutes on Day Y, Same program ran for 5X minutes on Day Y+1, can you figure out why ? I have found that collecting and analyzing Ash reports for the two given time periods is a good way to start looking at such issues.

The first step is to determine which was the database session that was running the batch program. I step through the following process to identify the database session.

– Find a sql that the batch program executes

– Identify the sqlid for that sql statement

– Query dba_hist_active_sess_history to find the session_id’s executing the sql in a given time period.

select session_id from
dba_hist_active_sess_history
where
sql_id = '&sqlid'
and instance_number = &inst
and snap_id between &Begin and &End
/

Then I run the following shell script to generate the ash report for the given SID and time period.

The syntax for running the script is

./genash.sh dbid 'MM/DD/YY HH24:MI' duration credentials inst sid

eg:
./genashsid.sh 131424457 '01/30/11 11:00' 10 system/manager@rk01 1 1021

 

The actual script is below

#!/bin/bash

if [ $# != 6 ]
then
echo Syntax  genash.sh dbid \'01/30/11 11:00\' duration credentials inst sid
exit 1
fi

l_dbid=$1
l_start_time=$2
l_starttime_log=`echo $l_start_time|sed 's/\///g' | sed 's/ //'|sed 's/://'`
l_duration=$3
l_credentials=$4
l_inst=$5
l_sid=$6

l_logdir='/u01/Rk/Docs/ashs'

l_ash_log_file=${l_logdir}/ashrpt_${l_inst}_${l_sid}_${l_starttime_log}.html

sqlplus -s $l_credentials < /dev/null

set head off
set pages 0
set lines 132
set echo off
set feedback off
set termout off

define inst_num=${l_inst};
define dbid=${l_dbid};
define report_type='html';
define report_name=${l_ash_log_file};
define begin_time="${l_start_time}";
define duration=${l_duration};
define target_session_id   = ${l_sid};

define slot_width  = '';
define target_sql_id       = '';
define target_wait_class   = '';
define target_service_hash = '';
define target_module_name  = '';
define target_action_name  = '';
define target_client_id    = '';
define target_plsql_entry  = '';

@$ORACLE_HOME/rdbms/admin/ashrpti
EOC

You can modify the script fairly easily to run it for
– All sessions
– For a specific module name
– A specific sqlid etc etc.

AWR display more than 10 sql’s.

By default AWR display the top 10 sql statements in the “SQL Statistics” section of the report, under the various categories (Elapsed, CPU, gets etc). Sometimes you might want to display more than the top 10 statements, for eg: let us say the top 50. If you want to accomplish this.

In the same session you are executing awrrpti.sql from (or awrrpt.sql) execute the following before you run awrrpti.

 

exec dbms_workload_repository.awr_set_report_thresholds(top_n_sql=>50);

 

Oracle Exadata Best Practices Tutorials

There is a set of recorded tutorials available from the oracle maximum availability architecture team, that cover a range of topics.

The tutorials can be accessed at  Oracle Exadata Best Practices Series

The topics include implementing Ebs, Siebel and Peoplesoft, resource manager, migration, DBFS, monitoring, backup and recovery, troubleshooting, patching and healthcheck.

Lots and Lots of great information.

HowTo create a resource plan using database resource manager

On any database server, there is often only a limited amount of resources (read cpu, memory, i/o). Oracle database servers are often subjected to highly concurrent, resource consuming, queries issued by database client programs. This often leads to contention for cpu, memory and i/o resources on a server. Once you have way more processes on a system actively executing, compared to the number of cpu cores you have, at some point the scheduling of these processes becomes inefficient at the operating system level.

The oracle Database resource manager can be used to throttle the Number of oracle database sessions actively running on a database, or to limit the amount of cpu resources a session gets, or to limit the degree of parallelism each session gets among other things.

Below are the steps to setup a resource plan that

Limits, maximum Degree of Parallelism each session can have, at 64.

Limits the maximum number of Concurrent active sessions that can be present for a user to 2. (The other sessions get queued).

Create a Consumer Group

Login to Enterprise Manager, click on the “Server” tab.

Under the “Resource Manager” section, click on “Consumer Groups”

Click on the “Create” Button

Enter a Name and Description for the Consumer group

Click on the “Add” button to add a database user to the consumer group

Check the box to the right of the user you want to add and click on “Select”

Now click on the “Database Instance” bread crumb to return to the “Server” tab

Click on “Consumer Group Mappings”

Click the radio button on the row that has the value “Oracle User” in the “View” column.

Click on “Add Rule for Selected Type”

In the drop down list “Selected consumer group”, choose “DOP_GROUP”.

In the “Available Oracle User” list, click on the user you want to add to this “DOP_GROUP” group and click on “Move”

Click Ok.

Click on Apply

Click on the “Database Instance” Breadcrump to return to the “Server Tab”

Click on “Resource Plans”

Click on “Create” to create a new resource plan

From the table “Available Group/Subplans” choose “DOP_GROUP” and click on Move. Click Ok.

Click on the tab named “Parallelism”

For the DOP_GROUP set the maximum degree of parallelism to be 64

Click on the tab named “Session Pool”

Set the number of active sessions to “2” (Or any number you want (In the screen i have used 12))

Click “Ok”

In the “Resource Plans” screen, choose the DOP_PLAN, In the actions drop down list, choose “Activate” and click on Go.

Now the resource plan “DOP_PLAN” is active.

Let us say now we submit a bunch of queries (Let us say 20) simultaneously (That use parallel query)as the database user HR then

Only 2 of them will be allowed to run at the same time (The rest are queued)

Each session gets a maximum Degree of parallelism of 64

What makes the oracle database machine V2 incredibly awesome ?

Lots of hardware power for the database servers

– 8 Real Application Clusters, Database server nodes (aka compute nodes) (Sunfire X4170’s)
– 2 quad core, Intel Xeon, E5540 processors (2.53ghz) in each server (Total 64 cores across 8 nodes)
– 72gb of RAM on each node (Total 576gb of Ram on the database servers)
– 3, 36 port QDR Infiniband switches (40gbit infiniband)

Lots of hardware power for the storage servers

– 14 Exadata cells, (Sunfire X4275’s).
– Each cell has
– 2 quad core Intel Xeon, E5540 processors on each cell (Total 112 cpu cores on all 14 cells together)
– 24Gb RAM on each cell (Total 336gb of Ram on all 14 cells together)
– 384Gb of Flash Cache (PCI-E flash card) on each cell (5Tb on all 14 cells together)
– 12x600gb (SAS) disks (7.2Tb) or 12x1Tb (SATA) disks (12Tb) (Total 100Tb with SAS disks, or 168Tb with SATA disks on all 14 cells together)
The above hardware gives the database machine the ability to read data at the rate of 21GigaBytes a second or 1,000,000 Io’s per second.

Balanced Configuration

Each disk drive (SAS) in the Sunfire X4275 server are, 3.5 inch, 15k Rpm, SAS 2.0, 600Gb drives. Each drive has an average read capacity of atleast 125MegaBytes per second for sequential scans. 168 disk drives, can together scan at the rate off (And return) 21,000 MegaBytes per second.
The Infiniband connections between the storage cells and the compute nodes have enough network bandwidth to transport data at the rate of 21GigaBytes per second.
The 64 cpu cores can issue i/o requests at the approx rate of 300megabytes per core, hence requesting about 21GigaBytes of data per second.
So the system is architect ed to optimally read (enough cpu’s to request the i/o, network bandwidth to ship the i/o and enough disk capacity to service that i/o requests). This is why the oracle database machine is a well balanced system.
At the read rate of 21GigaBytes a second, 1Terabyte of data can be read by the database servers in less than 1 minute.

Infiniband

Each database Node and Exadata Cell, has dual port, Quad data rate (QDR) Infiniband connectivity.
Infiniband is used for the database node to exadata cell connectivity and also for the Rac high speed interconnect (Cache fusion network)
Infiniband has the flexibility of a LAN network with the Speed of a SAN.
Oracle’s interconnect protocol uses DMA, to move data from the wire directly to the Memory without any additional copies made.

Exadata Cells

The exadata cells provide a highly redundant, high performance hardware with very intelligent software to efficiently process database i/o requests.
The hardware capability was discussed in the section “Lots of hardware power for the storage servers”.
The Intelligence in the Exadata cells include “Smart flash cache”, “Smart Scan’s”, “Storage Indexes” and “Exadata column compression”.
Oracle Automatic Storage Management (ASM) ensures that all oracle database files are evenly spread across all the 168 disks available in the database machine.
Oracle Database uses  the protocol iDB (Built by oracle, aptly called the Intelligent Database protocol) to communicate with the exadata cells. iDB is built on the Zero Data Loss, Zero Copy implementation (ZDP) of the industry standard protocol RDSv3 (Reliable datagram socket)

Flash cache

Random read operations are cached on the 5Tb of flash cache available in the database machine, significantly improving oltp performance.
The exadata smart flash cache, working with the database server, keeps track of data access patterns and intelligently manages the caching of blocks from the oracle datafiles.

Smart Scans

The oracle database server uses the iDB protocol to push query predicates (That limit data retrieved by the query using filters and join conditions) to the exadata cell.
This enables the cell to do three things
– Identify rows that are needed by the query and ship only those rows back to the database server (Not entire blocks)
– Identify the columns needed by the query and ship only the required columns in the rows back to the database server
– Use Bloom filters and process join conditions and ship only matching rows in queries with joins back to the database server
This drastically reduces the amount of data send back to the database server (Reducing the network usage)
Transferring file blocks from disks inside a cell to the physical memory of the cell can happen relatively fast. If a lot of the data that is not needed by the database server, can be eliminated at the cell, then the amount of data that needs to go over a network to the database server is significantly reduced. So the network i/o usage between the database servers and exadata cells is reduced by smart scans.

Storage Indexes

Smart scans however do not reduce the disk i/o within the cell (ie transfer from disk to cell physical memory).
Oracle creates an in memory array of structures, that keep track off min and max values of columns (columns used in the where clause that benefit by storage indexes), that let oracle identify if specific 1mb regions are needed based on the filtering conditions applied to the tables.
So storage indexes is a filter oracle applies to prune away 1mb chunks that do not have to be read.
This reduces the i/o within the exadata cell.

Exadata Hybrid Columnar Compression

Traditional relational databases store data in database blocks in a “Row” format. This type of storage limits the amount of compression that can be achieved
Column store databases, organizes and stores data by column. Storing column data together, with the same datatype and similar charachteristics , results in significant compression that can be achieved. However if the query refers to more than a couple of columns in the table, or does more than modest updates and inserts , those queries and dml tend to have slower performance.
Exadata hybrid columnar compression takes a blended approach. Oracle takes rows that fit into multiple blocks (Called a compression unit), converts the rows into columns and stores the data in a columnar format within a compression unit.
Oracle uses 3 different compression format’s and different transformations, depending on the compression level you have chosen
There are 4 levels of compression (Each level is a tradeoff between compression ratio and compression speeds)
– Query Low
– Query High (Default when you say compress for query)
– Archive Low
– Archive High
With Exadata Hybrid columnar compression
1) The amount of storage required to store massive amounts of data could be potentially decreased by a factor of 10.
2) The amount of i/o to be issued (For queries that scan very very large tables) is significantly reduced.
The above features together make’s the Oracle Database Machine, rock database performance.

11gr2 new awr reports for Real Application Clusters

There are two new awr reports in 11gr2, which will be helpful to dba’s in Real Application Clusters Environments (RAC).

awrgrpt.sql

This is a cluster wide awr report, so you can see a lot of the information from all the nodes in the same section, and you can also see aggregated statistics from all the instances at the same time (You can see totals, averages and standard deviations).

awrgdrpt.sql

This is a cluster wide stats diff report (like you had awrddrpt.sql in 11gr1), comparing the stats differences between two different snapshot intervals, across all nodes in the cluster.

These are huge additions to the awr reports, that enable understanding the database performance in real application clusters environments.

Cloud computing definition.

The National Institute of Standards and technology has a good, concise  definition of cloud computing. Sushil kumar of Oracle, was using the same language to define cloud computing in an article the current release of the oracle magazine.

Essential Charachteristics

  • On Demand Self-Service
  • Broad Network Access
  • Resource Pooling
  • Rapid Elasticity
  • Measured Service

Service Models

  • Cloud Software as a Service (SaaS)
  • Cloud Platform as a Service (PaaS)
  • Cloud Infrastructure as a Service (IaaS)

Deployment Models

  • Private Cloud
  • Community Cloud
  • Public Cloud
  • Hybrid Cloud