Auto DOP and calibrate_io

I was posed the following question by a colleague recently. I am using auto DOP in 11gr2 and I am setting parallel_degree_threshold=12. But when i execute my query, my query is still running with  a parallelism of 48 (ie it seems to be ignoring my setting for parallel_degree_threshold). The problem turned out  to be that for Auto DOP to work,  you need to have dbms_workload_manager.calibrate_io run and the table resource_io_calibrate$ populated.

This requirement is explicitly stated in the oracle 11gr2 documentation at http://download.oracle.com/docs/cd/E11882_01/server.112/e25523/parallel002.htm#CIHEFJGC

 

“When PARALLEL_DEGREE_POLICY is set to AUTO, Oracle Database determines whether the statement should run in parallel based on the cost of the operations in the execution plan and the hardware characteristics. The hardware characteristics include I/O calibration statistics so these statistics must be gathered otherwise Oracle Database does not use the automatic degree policy feature.”

Automatic DOP, where oracle figures out the degree of parallelism to use for a given query, gets turned on by setting the initialization parameter PARALLEL_DEGREE_POLICY to either AUTO or LIMITED (When it is limited it does NOT turn on parallel query queuing and in memory parallel query). So the requirement to gather calibrate_io output is applicable to both the setting AUTO and LIMITED.

When you are using Auto Dop, and you want to limit the maximum parallelism that a query can get, you can use the parameter parallel_degree_limit=cpu/io/<integer>, to limit the DOP of a query.

When you create a new database using a DBCA (Database configuration assistant) template, in the new database, the calibrate_io statistics is not present. You can verify this by running  the queries below.

SQL> select * from resource_io_calibrate$;

no rows selected
SQL> select * from v$io_calibration_status;

STATUS
-------------
CALIBRATION_TIME
---------------------------------------------------------------------------
NOT AVAILABLE

Below is an example.

  • I have created a new table named sales_rk that has 58 million rows in it.
  • There are no indexes, or primary keys on this table.
  • The table is decorated with parallel degree DEFAULT
  • parallel_degree_policy=LIMITED
  • parallel_degree_limit=2
  • cpu_count=2
  • parallel_threads_per_cpu=2
  • At this time there are no rows in resource_io_calibrate$

I ran the query “select count(*) from sales_rk”

I would have expected auto dop to have kicked in and parallel_degree_limit to have limited the DOP of the query to 2.

However that is not what happened.

The query ran with a DOP of 4.

So oracle just picked the default DOP (cpu_count x parallel_threads_per_cpu).

Further evidence that Auto DOP did not kick in can be found by examining the dbms_xplan output of the statement.

 

SQL> select count(*) from sales_rk;

  COUNT(*)
----------
  58805952

SQL> select * from table(dbms_xplan.display_cursor(null,null,'TYPICAL'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID	1bzaqj7scjp7p, child number 0
-------------------------------------
select count(*) from sales_rk

Plan hash value: 2302347944

--------------------------------------------------------------------------------------------------------
| Id  | Operation	       | Name	  | Rows  | Cost (%CPU)| Time	  |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |	  |	  | 21544 (100)|	  |	   |	  |	       |
|   1 |  SORT AGGREGATE        |	  |	1 |	       |	  |	   |	  |	       |
|   2 |   PX COORDINATOR       |	  |	  |	       |	  |	   |	  |	       |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |	1 |	       |	  |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |	  |	1 |	       |	  |  Q1,00 | PCWP |	       |
|   5 |      PX BLOCK ITERATOR |	  |    58M| 21544   (1)| 00:04:19 |  Q1,00 | PCWC |	       |
|*  6 |       TABLE ACCESS FULL| SALES_RK |    58M| 21544   (1)| 00:04:19 |  Q1,00 | PCWP |	       |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access(:Z>=:Z AND :Z)

Note
-----
   - automatic DOP: skipped because of IO calibrate statistics are missing

The Note section explicitly states that Auto DOP was skipped.

Now to fix the situation, i ran dbms_resource_manager.calibrate_io.

DECLARE
  lat  INTEGER;
  iops INTEGER;
  mbps INTEGER;
BEGIN
  DBMS_RESOURCE_MANAGER.CALIBRATE_IO (1, 10, iops, mbps, lat);
end;
/

 

Beware of calibrate_io generating bad numbers for io mbps. If it does, then follow instructions in MOS note 1269321.1 to delete the contents of resource_io_calibrate$ and populate the table manually.

Bounce the database.

Check whether calibrate_io worked

SQL> l
  1* select * from v$io_calibration_status
SQL> /

STATUS
-------------
CALIBRATION_TIME
---------------------------------------------------------------------------
READY
07-OCT-11 05.56.40.911 PM

 

Now re-run the same query, Auto DOP kicks in, and it executes with a DOP of 2.

Looking at the execution plan, confirms that Auto DOP did kick in.

 

SQL> select * from table(dbms_xplan.display_cursor(null,null,'TYPICAL'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
SQL_ID	1bzaqj7scjp7p, child number 1
-------------------------------------
select count(*) from sales_rk

Plan hash value: 2302347944

--------------------------------------------------------------------------------------------------------
| Id  | Operation	       | Name	  | Rows  | Cost (%CPU)| Time	  |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |	  |	  | 43087 (100)|	  |	   |	  |	       |
|   1 |  SORT AGGREGATE        |	  |	1 |	       |	  |	   |	  |	       |
|   2 |   PX COORDINATOR       |	  |	  |	       |	  |	   |	  |	       |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |	1 |	       |	  |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |	  |	1 |	       |	  |  Q1,00 | PCWP |	       |
|   5 |      PX BLOCK ITERATOR |	  |    58M| 43087   (1)| 00:00:08 |  Q1,00 | PCWC |	       |
|*  6 |       TABLE ACCESS FULL| SALES_RK |    58M| 43087   (1)| 00:00:08 |  Q1,00 | PCWP |	       |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access(:Z>=:Z AND :Z)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 2 because of degree limit

Exadata smart flash log

The exadata development team has now released the exadata cell software version 11.2.2.4.0, which includes a new feature called the “Smart Flash Log”.

In a nutshell, this new feature speeds up redo log writes. Exadata smart flash log uses Exadata smart flash cache as a temporary storage to provide low latency redo log writes. With this new feature enabled, oracle writes both to physical disk and the flash cache simultaneously.

So if for some reason, flash cache writes are slow, the writes to the physical disk will provide the good response times. Similarly if the physical disk writes are slow, the flash cache writes will complete faster, providing the good response times.

You can use the “Create flash log”, cell command to turn this feature on. You can use the “Drop flash log” cell command to turn this feature off.

The exadata storage server software, users guide, has been updated with this information.

You have to have Bundle patch 11 (Actually it works from BP9 onwards, but BP11 is recommended) and exadata cell software 11.2.2.4.0 applied to get this functionality.

Please read the section “Exadata Smart Flash Logging : Flash for database logging”, in the oracle technical white paper, Exadata smart flash cache features and the Oracle Exadata Database Machine, for details.

Here is a video tutorial produced by Oracle Education on this topic, smart flash log.

Monitoring Exadata database machine with Oracle Enterprise Manager 11g

Oracle Enterprise manager Grid control, is hands down the best monitoring and management tool, for the oracle exadata database machine. It comes with plugins to monitor all the hardware components of the database machine, and sensible, preset thresholds for proactive monitoring.

Update (Nov 2011) : Enterprise manager 12c is now available, and Certified to be used with exadata. The master MOS note 1110675.1 covers the installation and configuration details.

Some key points
  • You should use 11gR1 enterprise manager grid control for monitoring.
  • You should use 11gR1 enterprise manager agents, to monitor the targets on the database machine.
  • If you use enterprise wide monitoring tools like tivoli, openview or netcool, use snmp traps from oracle enterprise manager, to notify these monitoring tools (ie dont try to directly use snmp to monitor the exadata components. You could do this but it will be too time consuming).
  • You could potentially use 10.2.0.5 Oem, with 11g agents to monitor the dbmachine, but this is not recommended as a stable/long term solution.
  • The following components (And more) can be monitored using Enterprise Manager
    • Databases hosts
    • Exadata Cells
    • Cisco switch
    • KVM (Keyboard, Video, Mouse)
    • ILOM Monitoring
    • Infiniband switch
    • Power distribution unit (PDU)
You have 3 possible options to configure enterprise manager
  • If you have an existing 11gR1 enterprise manager grid control envrionment, you can patch it with the recommended patches and use that for monitoring the dbmachine targets.
  • You can setup and configure a brand new 11gR1 enterprise manager grid control environment (On a separate server) and configure it to monitor the dbmachine targets.
    • Download the required software
      • Weblogic server 10.3.2 (MOS Note 1106105.1, 1063112.1)
      • Jdk 64 bit (Mos Note 1063587.1)
      • 11gR1 Oms from download.oracle.com
    • Install Java and Web Logic Server (Wls)
      • MOS Note 1063762.1
    • Patch Web Logic Server
      • MOS Note 1072763.1
    • Install 11gR1 Enterprise manager Oracle Management Server (OMS)
      • Install/Create a 11gR2 database to serve as the Enterprise Manager Repository
      • Database pre-reqs for 11.1.0.1 repository (Mos Note 1064441.1)
      • Install/Configure Oms (Mos Notes 1130958.1, 1059516.1)
    • Patch OMS with the required patches to enable database machine monitoring
      • Mos Note 1323298.1
  • You can use an easy install option to setup and configure an enterprise manager environment and configure the plugins.
    • The easy install is delivered as a  patch 11852882 (EMGC setup automation kit)
    • The configuration worksheet has to be filled out properly (Before the installation) and the em.param file has to be generated.
    • Follow the instructions in the readme to do a quick install of a fully configured 11gR1 Enterprise manager installation.
    • This method helps you install/patch  and configure the full 11gR1 oms in just an few steps and is a huge time saver.
Download the required plugins to monitor the following components
Download the plugins from the enterprise manager extensions exchange
http://www.oracle.com/technetwork/database/exadata/index.html#plug-in (Exadata cell plugin)
http://www.oracle.com/technetwork/oem/grid-control/exadata-plug-in-bundle-188771.html (All the rest of the plugins)
Install and Configure the Agent and the Plugins
       Additional tutorials with screenshots on configuring the plugins can be found below
Sending SNMP traps to 3rd party monitoring tools.
  • Get the Mib (Management Information Base) file from your enterprise manager management server and send it to the 3rd party tool administrator (eg: openview or netcool). Follow MOS note 389585.1, to get this MIB file.
  • Then configure your notification methods and rules to send the required snmp traps to the 3rd party tool.

Shell script to generate awr diff reports

Awr diff reports (awrddrpt.sql) are a great tool for comparing database performance characteristics from different time periods. Lot of times, the details in the report can point you to differences in workload profiles, or sql execution differences, helping you to narrow down problem areas.

Awr diff report, can compare the awr data for different time periods

– Period A v/s Period B for the same instance in the same database

– Period A in instance X v/s Period B in instance Y in the same database

– Period A in instance X in database P v/s Period B in instance Y in database Q

I routinely export awr data from different databases (awrextr.sql) and import them to a central database (awrload.sql). However one problem with this is that once it is in the central database, in 11gr2 there is no way of viewing this data via Enterprise Manager. So i end up writing sql statements to compare the data.

Below is a handy shell script that helps you compare two different time periods.

You can run the script using the syntax below.

./genawrd.sh dbid1 inst1 startsnapid1 endsnapid1 dbid2 inst2 startsnapid2 endsnapid2 credentials

eg:
./genawrd.sh 3401191500 3 1111 1112 1346415800 2 757 758 system/manager@rk01
#!/bin/bash

if [ $# != 9 ]
then
echo Syntax  genawrd.sh dbid inst1 startsnapid1 endsnapid1 dbid2 inst2 startsnapid2 endsnapid2 credentials
exit 1
fi

l_dbid=$1
l_instid=$2
l_start_snapid=$3
l_end_snapid=$4

l_dbid2=$5
l_instid2=$6
l_start_snapid2=$7
l_end_snapid2=$8
l_credentials=$9

l_logdir='/u01/Temp/awrs'

l_awr_log_file=${l_logdir}/awrdrpt_${l_instid}_${l_instid2}_${l_start_snapid}_${l_end_snapid}_${l_start_snapid2}_${l_end_snapid2}.html

sqlplus -s $l_credentials << EOC  > /dev/null
set head off
set pages 0
set lines 132
set echo off
set feedback off
set termout off
define inst_num=${l_instid};
define inst_num2=${l_instid2};
define dbid=${l_dbid};
define dbid2=${l_dbid2};
define report_type='html';
define report_name=${l_awr_log_file};
define begin_snap=${l_start_snapid};
define end_snap=${l_end_snapid};
define begin_snap2=${l_start_snapid2};
define end_snap2=${l_end_snapid2};
@$ORACLE_HOME/rdbms/admin/awrddrpi
EOC

Shell script to generate ash report

From time to time i get questions like “Batch program A was running in X minutes on Day Y, Same program ran for 5X minutes on Day Y+1, can you figure out why ? I have found that collecting and analyzing Ash reports for the two given time periods is a good way to start looking at such issues.

The first step is to determine which was the database session that was running the batch program. I step through the following process to identify the database session.

– Find a sql that the batch program executes

– Identify the sqlid for that sql statement

– Query dba_hist_active_sess_history to find the session_id’s executing the sql in a given time period.

select session_id from
dba_hist_active_sess_history
where
sql_id = '&sqlid'
and instance_number = &inst
and snap_id between &Begin and &End
/

Then I run the following shell script to generate the ash report for the given SID and time period.

The syntax for running the script is

./genash.sh dbid 'MM/DD/YY HH24:MI' duration credentials inst sid

eg:
./genashsid.sh 131424457 '01/30/11 11:00' 10 system/manager@rk01 1 1021

 

The actual script is below

#!/bin/bash

if [ $# != 6 ]
then
echo Syntax  genash.sh dbid \'01/30/11 11:00\' duration credentials inst sid
exit 1
fi

l_dbid=$1
l_start_time=$2
l_starttime_log=`echo $l_start_time|sed 's/\///g' | sed 's/ //'|sed 's/://'`
l_duration=$3
l_credentials=$4
l_inst=$5
l_sid=$6

l_logdir='/u01/Rk/Docs/ashs'

l_ash_log_file=${l_logdir}/ashrpt_${l_inst}_${l_sid}_${l_starttime_log}.html

sqlplus -s $l_credentials < /dev/null

set head off
set pages 0
set lines 132
set echo off
set feedback off
set termout off

define inst_num=${l_inst};
define dbid=${l_dbid};
define report_type='html';
define report_name=${l_ash_log_file};
define begin_time="${l_start_time}";
define duration=${l_duration};
define target_session_id   = ${l_sid};

define slot_width  = '';
define target_sql_id       = '';
define target_wait_class   = '';
define target_service_hash = '';
define target_module_name  = '';
define target_action_name  = '';
define target_client_id    = '';
define target_plsql_entry  = '';

@$ORACLE_HOME/rdbms/admin/ashrpti
EOC

You can modify the script fairly easily to run it for
– All sessions
– For a specific module name
– A specific sqlid etc etc.

AWR display more than 10 sql’s.

By default AWR display the top 10 sql statements in the “SQL Statistics” section of the report, under the various categories (Elapsed, CPU, gets etc). Sometimes you might want to display more than the top 10 statements, for eg: let us say the top 50. If you want to accomplish this.

In the same session you are executing awrrpti.sql from (or awrrpt.sql) execute the following before you run awrrpti.

 

exec dbms_workload_repository.awr_set_report_thresholds(top_n_sql=>50);

 

Exadata storage expansion cabinet.

Updated on Jan 10 2012 to include info about High Performance disks availability.

Oracle has announced the introduction of the “Exadata Storage Expansion Rack”. This Expansion Rack can be used to add storage capacity to a X2-2 or X2-8 oracle database machine environment.

The press release can be found here.

The datasheet can be found here.

Some Highlights.

  • The expansion rack comes with 4 to 18 storage servers
  • The expansion rack will be delivered with the same 600Gb High Performance (HP) hard disks or the 3Tb High Capacity (HC) hard disks.
  • Quarter Rack, HC disks, has 4 Exadata Storage servers, with 48, 3TB, SAS High Capacity disks, 1.5Tb flash cache.
    • 64Tb of uncompressed usable capacity
  • Half Rack, HC disks,  has 9 Exadata Storage Servers, with 108 , 3TB, SAS High Capacity disks, 3.4Tb flash cache.
    • 144 Tb of uncompressed usable capacity
  • Full Rack, HC disks, has 18 Exadata Storage Servers, with 216, 3TB, SAS High Capacity disks, 6.75Tb flash cache.
    • 288 Tb of uncompressed, Usable capcity
  • Quarter Rack, HP disks, has 4 Exadata Storage servers, with 48, 600Gb SAS High Performance disks, 1.5Tb flash cache.
    • 13Tb of uncompressed usable capacity
  • Half Rack, HP disks,  has 9 Exadata Storage Servers, with 108 , 600Gb, SAS High Performance disks, 3.4Tb flash cache.
    • 29 Tb of uncompressed usable capacity
  • Full Rack, HP disks, has 18 Exadata Storage Servers, with 216, 600Gb, SAS High Performance disks, 6.75Tb flash cache.
    • 58 Tb of uncompressed, Usable capacity
  • Uncompressed  Flash Cache, I/O bandwidth of upto 88Gb/Second on the full rack.

 

Oracle Exadata Best Practices Tutorials

There is a set of recorded tutorials available from the oracle maximum availability architecture team, that cover a range of topics.

The tutorials can be accessed at  Oracle Exadata Best Practices Series

The topics include implementing Ebs, Siebel and Peoplesoft, resource manager, migration, DBFS, monitoring, backup and recovery, troubleshooting, patching and healthcheck.

Lots and Lots of great information.

Oracle Learning Library

There is a lot of publicly available free content created by oracle server technologies. This content is mostly in the form of Demo’s, OBE’s, Tutorials, Videos etc. A wide range of topics are covered including database, fusion middleware, java, enterprise manager etc, etc…

It is a great resource for learning new oracle topics.

You need a oracle technology network (otn) login to access this content.

The content can be accessed at http://oracle.com/oll

There is also a Oracle Learning You Tube Channel.

http://www.youtube.com/user/OracleLearning

Rman, duplicate from incrementally updated backup

In the last post i addressed how to create and maintain an incrementally updated rman backup, and how to switch to the backup, in case some corruption happens to the production database.

In this post, instead of doing the switch, we will use the incrementally updated backup to create a cloned copy of the original database. The practical use of this method will be that, if you are creating incrementally updated backups every night, you can use the backup as the source to clone the database (ie you dont have to run a whole backup of your database again).

rman has a command named duplicate that allows you to clone an existing database.

In this case we are cloning a database named rk01, to another database named rk02 on the same host, using a incrementally updated backup, with the rman duplicate command.

First follow the steps to create the incrementally updated backup from my previous post.

Enable Archivelog And Block Change Tracking
Run a level 0 full image copy backup of the database
Perform updates in the database
Perform an incremental level 1 backup
Apply the incremental level 1 backup to the level 0 image copy

At this point after the incrementally updated backup is created, in the previous post, i was doing a switch. Instead of that we will do the rman duplicate.

Create a spfile for the rk02 database (I usually make a pfile from spfile from rk01, edit the pfile change all the occurances of rk01 to rk02 and create the spfile for rk02 from this pfile. Also create the necessary directories for bdump,adump etc.)

Create a password file for the rk02 database (orapwd file=$ORACLE_HOME/dbs/orapwrk02 entries=30)

Startup the database rk02 in nomount mode

export ORACLE_SID=rk02

sqlplus / as sysdba

startup nomount

Switch the current logfile in the rk01 database

This is an important step. Miss this and your rman duplicate will likely fail. The reason is that the scn until which the incremental backup is taken (And was applied to the original backup copy), is still in the online redo log. When you do a duplicate rman looks for a backup that has a ckp scn that is < the first_change# in your current online redo log. Once you do a switch, and then run the duplicate, rman will be able to find the incrementally updated backup as the source to clone from.

export ORACLE_SID=rk01

sqlplus / as sysdba

alter system archive log current;

Perform the rman duplicate.

Then issue the rman duplicate command (If you are cloning to a physically separate host, you will have to create appropriate listener.ora entries, tnsnames.ora entries and make the location of the backup copy available to your target host).

export ORACLE_SID=rk02

run the following script.

#!/bin/bash
rman target="sys/manager@rk01" auxiliary / log=/u01/oraback/clonefromiub.log << EOF
RUN
{
  SET NEWNAME FOR DATAFILE 1 TO '/u01/oradata/rk02/system01.dbf';
  SET NEWNAME FOR DATAFILE 2 TO '/u01/oradata/rk02/sysaux01.dbf';
  SET NEWNAME FOR DATAFILE 3 TO '/u01/oradata/rk02/undotbs01.dbf';
  SET NEWNAME FOR DATAFILE 4 TO '/u01/oradata/rk02/users01.dbf';
  SET NEWNAME FOR DATAFILE 5 TO '/u01/oradata/rk02/example01.dbf';
  SET NEWNAME FOR TEMPFILE 1 TO '/u01/oradata/rk02/temp01.dbf';
  DUPLICATE  TARGET DATABASE  TO rk02
    LOGFILE
      GROUP 1 ('/u01/oradata/rk02/redo01a.log',
               '/u01/oradata/rk02/redo01b.log') SIZE 4M REUSE,
      GROUP 2 ('/u01/oradata/rk02/redo02a.log',
               '/u01/oradata/rk02/redo02b.log') SIZE 4M REUSE;
}
EXIT;
EOF