AWR – Profiling Database I/O

Oracle Awr (Automatic Workload Repository) statistics, captures and stores fine grained information about file reads and writes (aka i/o), that the database performed, during the course of execution of, application generated database workloads. When analyzing the read and write patterns of the database, it helps a lot to understand what type of activity is generating the reads and writes. With this stored information we can get an indepth understanding of the distribution of random and sequential reads and writes.

I use this information for getting a better understanding of the I/O profile, for my Exadata sizing exercises.

This information can be used to understand clearly how much of the i/o is from Temp activity, Datafile reads and writes, Archivelog writes, log writes, and whether these are small or large reads and writes.

To the best of my understanding the small reads and writes are those < 128k and the large reads and writes are those > 128k.

This information is contained mainly in two awr Views.

Dba_Hist_Iostat_FileType
Dba_Hist_Iostat_Function

Dba_Hist_Iostat_FileType

This view displays the historical i/o statistics by file type. The main filetypes are the following

Archive Log
Archive Log Backup
Control File
Data File
Data File Backup
Data File Copy
Data File Incremental Backup
Data Pump Dump File
Flashback Log
Log File
Other
Temp File

Dba_Hist_Iostat_Function

This view displays the historical i/o statistics by i/o function. The main i/o functions are the following

Recovery
Buffer Cache Reads
Others
RMAN
Streams AQ
Smart Scan
Data Pump
XDB
Direct Writes
DBWR
LGWR
Direct Reads
Archive Manager
ARCH

From everything i have seen sofar, these reads and writes can be directly co-related to the “Physical read total IO requests” and “Physical write total IO requests” system level statistics.

I have written a script that displays information from the above mentioned views and gives a detailed breakdown of i/o gendrated from different aspects of the database activities.
In order to fit in the computer screen real estate, i have actually limited the columns the script displays (So it displays only the file types i am frequently interested in). Please feel free to take the script and modify it to add columns that you want to display.

The full version of the script  awrioftallpct-pub.sql can be found here.

The script accepts the following inputs
– A begin snap id for a snapid range you want to report for
– A End snap id for a snapid range you want to report for
– A Dbid for the database
– The snap interval in seconds (If you have a 30 minute interval input 1800 seconds)

A description of all the column names in the output, broken down by section, is provided in the header section of the script.

There are 6 sections to this script

1) Total Reads + Writes
2) Total Reads
3) Total Writes
4) Read write breakdown for datafiles
5) Data File – Direct Path v/s Buffered Read Write breakdown
6) Read write breakdown for tempfiles

1) Total Reads + Writes

This section displays the number of reads+writes by filetype, and a percentage of reads+writes for each file type, as a percentage of total reads+writes. The last column displays the total reads+writes for all file types. The column DTDP shows the i/o that bypasses flash cache by default and goes directly to spinning disk on Exadata (Temp+Archivelogs+Flashback Logs).

io1-rw

Click on the image to see a larger version

2) Total Reads

This section displays the number of reads by filetype, and a percentage of reads for each file type, as a percentage of total reads. The last column displays the total reads for all file types.

io1-r

Click on the image to see a larger version

3) Total Writes

This section displays the number of writes by filetype, and a percentage of writes for each file type, as a percentage of total writes. The last column displays the total writes for all file types.

io1-w

Click on the image to see a larger version

4) Read write breakdown for datafiles

This section displays the I/O information only pertaining to datafile i/o. It displays the small and large reads and writes and a percentage they constitute of the total reads+writes to datafiles, and a percentage they constitute of the total reads or writes to datafiles. It also displays the total small and large reads and writes and a percentage they constitute of the total reads+writes to datafiles.

io1-dfrw

Click on the image to see a larger version

5) Data File – Direct Path v/s Buffered Read Write breakdown

This section provides a breakdown of I/O by function (As opposed to i/o by filetype in the previous sections). The output shows columns that display the direct path small and large reads and writes, buffered small reads and writes, smart scan small and large reads and other small and large reads and writes.

io3-bf

Click on the image to see a larger version

6) Read write breakdown for tempfiles

This section displays the I/O information only pertaining to tempfile i/o. It displays the small and large reads and writes and a percentage they constitute of the total reads+writes to tempfiles, and a percentage they constitute of the total reads or writes to tempfiles. It also displays the total small and large reads and writes and a percentage they constitute of the total reads+writes to tempfiles.

io3-tf

Click on the image to see a larger version

The full version of the script  awrioftallpct-pub.sql can be found here.

Oracle Active DataGuard – Considerations for the Wide area Network

Oracle customers use Oracle Active Dataguard to create and maintain one or many standby databases that protect their mission critical primary databases from disaster. Typically, in such deployments, the primary databases and standby databases are in geographically separate locations connected via a WAN (Wide Area Network). Log Transport Services, transfers Large volumes of redo logs from the primary location to the standby, using Sql*Net.

We have to ensure that all the components from the source to target are setup correctly to ensure that the data transfer can be done with the best throughput possible. If sufficient network bandwidth is not available with reasonable latencies, then we will start seeing the log transfer and apply,lagging on the standby site (Which is oracle speak for, your primary and standby database is now out of sync from a data perspective).

One key point to keep in mind is that, lower the network round trip time (aka latency), higher your data transfer throughput. Higher the network round trip time (aka latency), lower your data transfer throughput. So it is very important to maintain low round trip times on your Wide area network.

To understand network data transfer throughput, It is important to understand the the concepts of Tcp Window Size and Bandwidth Delay Product (aka BDP).

Tcp Window size is the amount of bytes that can be transmitted without receiving an acknowledgement from the other side. Once Tcp Window size amount of bytes are send, the sender stops sending any more bytes and waits for an acknowledgement from the receiver.

Bandwidth delay product is calculated as the product of the network bandwidth and network round trip time. bdp=network bandwidth*round trip time. This is the amount of data that left the sender before the first acknowledgement was received by the sender. If the senders output bandwidth is stable, and the bandwidth is fully used, then the BDP calculates the number of packets in transit. If we set the Tcp Window size equal to the bandwidth delay product, then in theory we should be able to fully utilize the available bandwidth.

Setup the network

We have to start by setting up the networking components to support the desired/stated bandwidth. So if you have a WAN that is a 10GigE network, all the NIC’s (Network interface cards), ports, switches in the configuration should be configured to support 10GigE full Duplex settings. After setup, run the configuration display utilities and ensure that all these component levels the transfer speeds are set to be 10 GigE. Customers often run into trouble when Auto Negotiation causes some NIC’s to set the transfer speeds to 1GigE because of configuration mismatches.

Use tools like Iperf to test the transfer speeds that your network is capable of achieving.

One important aspect to keep in mind is that it is probable that the WAN is shared by other traffic (e-mail, data replication, san replication). This has two important implications that we should consider.

  • If there is a lot of bandwidth consumption by some of this miscellaneous traffic, round trip times could be degraded periodically on the network.
  • We should be careful in our calculations that we do not completely consume the entire bandwidth for redo transport. (This could impact other processes)
    • So it is important to figure out (Working with the network admins) what the bandwidth entitlements are for redo transport and base our calculations on those numbers.

Caclulate our BDP

Use the following formula to calculate our Bandwidth delay product (BDP)

(bandwidth/8)*rount trip time in seconds.

The network bandwidth is expressed in bits per second, so we divide by 8 to convert to bytes.
Round Trip Time is usually in milli seconds, so we divide by 1000 to convert to seconds.

So for example, if we have a 10Gbit network bandwidth and a 40ms round trip time

BDP=(10000000000/8)*(40/1000) = 50,000,000 bytes.

Setup Sql*Net Parameters

The current recommendations for Dataguard Redo transport are as follows.

Set the SDU size to 65535

  • We can set SDU on a per connection basis using the SDU parameter in the local naming configuration file (TNSNAMES.ORA) and the listener configuration file (LISTENER.ORA)
  • We can set the SDU for all Oracle Net connections with the profile parameter DEFAULT_SDU_SIZE in the SQLNET.ORA file.

Set TCP.NODELAY to YES

To preempt delays in buffer flushing in the TCP protocol stack, disable the TCP Nagle algorithm by setting TCP.NODELAY to YES in the SQLNET.ORA file on both the primary and standby systems.

Setup RECV_BUF_SIZE and SEND_BUF_SIZE

The current recommendation is to set the SEND_BUF_SIZE and RECV_BUF_SIZE parameters (Which are the send and receive socket buffer sizes for SQL*Net) to 3 Times the BDP.

As per the above example we would set them to 50,000,000*3 = 150,000,000

Setup Operating system Kernel Parameters

If you are using the Linux operating system make sure that the values for the following kernel parameters are setup to be higher than the values set for RECV_BUF_SIZE and SEND_BUF_SIZE.

net.core.rmem_max
net.core.wmem_max

Once we have configured the network, operating system and the sql*net, and we have redo transfer, we can perform further network monitoring to see how the network bandwidth is being utilized, and make appropriate adjustments.

Links to helpful Documents

Iperf

How to calcluate Tcp throughput for long distance links (blog)

Oracle Net Performance Tuning (Mos)

Setting Send and Receive Buffer Sizes (Mos)

Tuning Sql*Net peformance (Oracle Docs)

Configuring Oracle Dataguard (Oracle Docs)

Best Practices for Sync Data Transport (White Paper)

Consolidated database replay

Real Application Testing option has been enhanced to support “Consolidated Database Replay” functionality to help assess the impact of database and workload consolidation to a single database. Consolidated Database Replay allows concurrent replay of multiple captured production workloads from the same or different systems to a single database. Customers can use this functionality to validate the consolidation strategy recommended by Consolidation Planner and to accurately predict the capacity and sizing of their database infrastructure before production deployment. Here are more details regarding this announcement:

  • Consolidated Database Replay support starting from Oracle Database Release 11.2.0.2 and above. Multiple workload captures (including from  pre-11.2.0.2 database releases) can be replayed concurrently in database release 11.2.0.2 and above
  • MOS Doc ID 1453789.1 provides more details on the required patches, how to use this functionality and associated best practices

Sql to extract awr data for analysis

Awr captures a wealth of database performance statistics and metrics. Enterprise manager is usually the preferred tool to interpret and analyze this data. However lot of times i resort to using sql statements to graph and look for trends in the awr data. Below are some of the sql’s that i use (Tested only in 11gr2) to extract this information. Please keep in mind that you need the “Database tuning pack” license to access the dba_hist views.

The full script can be downloaded here.

Awr Snapshot info by DbId, by Day

select d.dbid,d.db_name,to_char(s.begin_interval_time,'DD-MON-RR') begintime,min(s.snap_id) minsnap,max(s.snap_id)  maxsnap
              from dba_hist_snapshot s,dba_hist_database_instance d where s.instance_number = 1
              and s.instance_number = d.instance_number and s.dbid = d.dbid
              group by d.dbid,d.db_name,to_char(s.begin_interval_time,'DD-MON-RR') order by 1
/

	       DBID DB_NA BEGINTIME		 MINSNAP   MAXSNAP
------------------- ----- -------------------- --------- ---------
	  220594996 QAD   16-NOV-11		    5205      5217
	  220594996 QAD   17-NOV-11		    5218      5220
	 2085202933 RK01  15-MAY-12		       3	 4

Database version and platform

select distinct version,platform_name from dba_hist_database_instance where dbid=&dbid
/

VERSION           PLATFORM_NAME
----------------- ---------------------------
11.2.0.2.0        Linux x86 64-bit

Cpu’s/Sockets/Cores/Load Average

select * from (
                  select  stat_name,value
                  from  dba_hist_osstat
                  where dbid = &dbid
                  and instance_number = 1
                  and snap_id = &esnap
                  )
                  pivot (sum(value) for stat_name in ('NUM_CPUS','NUM_CPU_SOCKETS','NUM_CPU_CORES','PHYSICAL_MEMORY_BYTES','LOAD'))
/

'NUM_CPUS' 'NUM_CPU_SOCKETS' 'NUM_CPU_CORES' 'PHYSICAL_MEMORY_BYTES'     'LOAD'
---------- ----------------- --------------- ----------------------- ----------
        24                 2              12              1.0122E+11 .209960938

O/S Cpu Usage

select
to_char(begintime,'DD-MON-YY HH24:MI:SS') begintime,
to_char(endtime,'DD-MON-YY HH24:MI:SS') endtime,
inst,
snapid,
round((utdiff/(utdiff+itdiff+stdiff+iowtdiff+ntdiff))*100)  utpct,
round((ntdiff/(utdiff+itdiff+stdiff+iowtdiff+ntdiff))*100)  ntpct,
round((stdiff/(utdiff+itdiff+stdiff+iowtdiff+ntdiff))*100)  stpct,
round((iowtdiff/(utdiff+itdiff+stdiff+iowtdiff+ntdiff))*100)  iowtpct,
(100-
(
 round((utdiff/(utdiff+itdiff+stdiff+iowtdiff+ntdiff))*100)+
 round((ntdiff/(utdiff+itdiff+stdiff+iowtdiff+ntdiff))*100)+
 round((stdiff/(utdiff+itdiff+stdiff+iowtdiff+ntdiff))*100)+
 round((iowtdiff/(utdiff+itdiff+stdiff+iowtdiff+ntdiff))*100)
)) itpct
from
(
select begintime,endtime,(extract(Minute from endtime-begintime)*60+extract(Second from endtime-begintime)) secs,
snapid,inst,
ut-(nvl(lag(ut) over (partition by inst order by inst,snapid),0)) utdiff,
bt-(nvl(lag(bt) over (partition by inst order by inst,snapid),0)) btdiff,
it-(nvl(lag(it) over (partition by inst order by inst,snapid),0)) itdiff,
st-(nvl(lag(st) over (partition by inst order by inst,snapid),0)) stdiff,
iowt-(nvl(lag(iowt) over (partition by inst order by inst,snapid),0)) iowtdiff,
nt-(nvl(lag(nt) over (partition by inst order by inst,snapid),0)) ntdiff,
vin-(nvl(lag(vin) over (partition by inst order by inst,snapid),0)) vindiff,
vout-(nvl(lag(vout) over (partition by inst order by inst,snapid),0)) voutdiff
from
(
select sn.begin_interval_time begintime,
     sn.end_interval_time EndTime,oss.snap_id SnapId,oss.instance_number Inst,
     sum(decode(oss.stat_name,'USER_TIME',value,0)) ut,
     sum(decode(oss.stat_name,'BUSY_TIME',value,0)) bt,
     sum(decode(oss.stat_name,'IDLE_TIME',value,0)) it,
     sum(decode(oss.stat_name,'SYS_TIME',value,0)) st,
     sum(decode(oss.stat_name,'IOWAIT_TIME',value,0)) iowt,
     sum(decode(oss.stat_name,'NICE_TIME',value,0)) nt,
     sum(decode(oss.stat_name,'VM_IN_BYTES',value,0)) vin,
     sum(decode(oss.stat_name,'VM_OUT_BYTES',value,0)) vout
from dba_hist_osstat oss,dba_hist_snapshot sn
where oss.dbid=&dbid
and   oss.dbid = sn.dbid
and   oss.instance_number =  sn.instance_number
and   oss.snap_id = sn.snap_id
and   oss.snap_id between &bsnap - 1 and &esnap
and   oss.stat_name in (
'USER_TIME',
'BUSY_TIME',
'IDLE_TIME',
'SYS_TIME',
'IOWAIT_TIME',
'NICE_TIME',
'VM_IN_BYTES',
'VM_OUT_BYTES'
)
group by sn.begin_interval_time,sn.end_interval_time,oss.snap_id,oss.instance_number
order by oss.instance_number,oss.snap_id
)
)
where snapid between &bsnap and &esnap
order by inst,snapid
/

BEGINTIME            ENDTIME               in     SNAPID  UTPCT  NTPCT  STPCT IOWTPCT  ITPCT
-------------------- -------------------- --- ---------- ------ ------ ------ ------- ------
16-NOV-11 21:00:12   16-NOV-11 21:20:16     1       5209   1.00    .00    .00     .00  99.00
16-NOV-11 21:20:16   16-NOV-11 21:40:12     1       5210   4.00    .00   1.00     .00  95.00
16-NOV-11 21:40:12   16-NOV-11 22:00:03     1       5211  31.00    .00   2.00     .00  67.00
16-NOV-11 22:00:03   16-NOV-11 22:20:05     1       5212  58.00    .00   2.00     .00  40.00

Elapsed Time/DB Time/Concurrent Active Users

select
to_char(begintime,'DD-MON-YY HH24:MI:SS') begintime,
to_char(endtime,'DD-MON-YY HH24:MI:SS') endtime,
inst,
snapid,
round(dbtdiff/(1000000*60),2) dbt,
round(secs/60) mins,
round(dbtdiff/(1000000*60*round(secs/60))) concactive
from
(
select begintime,endtime,(extract(Minute from endtime-begintime)*60+extract(Second from endtime-begintime)) secs,
snapid,inst,
dbt-(nvl(lag(dbt) over (partition by inst order by inst,snapid),0)) dbtdiff
from
(
select sn.begin_interval_time begintime,
     sn.end_interval_time EndTime,tm.snap_id SnapId,tm.instance_number Inst,
     sum(decode(tm.stat_name,'DB time',value,0)) dbt
from dba_hist_sys_time_model tm,dba_hist_snapshot sn
where tm.dbid=&dbid
and   tm.dbid = sn.dbid
and   tm.instance_number =  sn.instance_number
and   tm.snap_id = sn.snap_id
and   tm.snap_id between &bsnap - 1 and &esnap
and   tm.stat_name in (
'DB time'
)
group by sn.begin_interval_time,sn.end_interval_time,tm.snap_id,tm.instance_number
order by tm.instance_number,tm.snap_id
)
)
where snapid between &bsnap and &esnap
order by inst,snapid
/

BEGINTIME            ENDTIME               in     SNAPID       DBT    MINS CONCACTIVE
-------------------- -------------------- --- ---------- --------- ------- ----------
16-NOV-11 21:00:12   16-NOV-11 21:20:16     1       5209         1      20          0
16-NOV-11 21:20:16   16-NOV-11 21:40:12     1       5210       319      20         16
16-NOV-11 21:40:12   16-NOV-11 22:00:03     1       5211       657      20         33
16-NOV-11 22:00:03   16-NOV-11 22:20:05     1       5212       972      20         49
16-NOV-11 22:20:05   16-NOV-11 22:40:06     1       5213       457      20         23
16-NOV-11 22:40:06   16-NOV-11 23:00:08     1       5214       736      20         37
16-NOV-11 23:00:08   16-NOV-11 23:20:25     1       5215         3      20          0

Top 5 Foreground Waits

The percentages i get here, do not seem to match up exactly, with the percentages you get, when you run awrrpt.sql, to generate the corresponding awr report. However it always seems to be within + or – 3% of the awrrpt.sql value. I think that the variation is because of the way that the awrrpt.sql might be rounding values, and probably how it accouts for Idle events.

 

with se as (
     select sn.begin_interval_time begintime,
        sn.end_interval_time EndTime,se.snap_id SnapId,se.instance_number Inst,
        se.event_name stat,se.time_waited_micro_fg value,
        nvl(lag(se.time_waited_micro_fg) over(partition by se.instance_number,se.event_name
        order by se.instance_number,se.snap_id,se.event_name),0) prevval,
        se.time_waited_micro_fg-
        nvl(lag(se.time_waited_micro_fg) over(partition by se.instance_number,se.event_name
        order by se.instance_number,se.snap_id,se.event_name),0) valuediff
     from dba_hist_system_event se,dba_hist_snapshot sn
     where se.dbid=&dbid
     and   se.dbid = sn.dbid
     and   se.instance_number =  sn.instance_number
     and   se.snap_id = sn.snap_id
     and   se.snap_id between &bsnap-1 and &esnap
     and   se.wait_class != 'Idle'
     order by se.snap_id,se.instance_number,se.event_name
     ) ,
     sdbcpu as (
         select sn.begin_interval_time begintime,sn.end_interval_time EndTime,
         stm.snap_id snapid,stm.instance_number inst,stm.stat_name stat
         ,stm.value value
         ,nvl(lag(stm.value) over(partition by stm.instance_number order by stm.instance_number,stm.snap_id),0) prevval
         ,stm.value-
         nvl(lag(stm.value) over(partition by stm.instance_number order by stm.instance_number,stm.snap_id),0) valuediff
         from dba_hist_sys_time_model stm,dba_hist_snapshot sn
         where
         stm.stat_name = ('DB CPU')
         and stm.dbid = &dbid
         and stm.snap_id between  &bsnap-1 and &esnap
         and stm.dbid = sn.dbid
         and stm.instance_number = sn.instance_number
         and stm.snap_id = sn.snap_id
         order by stm.snap_id,stm.instance_number
     ) ,
     sunion as (
         select begintime,endtime,snapid,inst,stat,valuediff from se
         union all
         select begintime,endtime,snapid,inst,stat,valuediff from sdbcpu
         order by 3,4
     ),
     spct as (
     select begintime,endtime,snapid,inst,stat,valuediff,
     round(ratio_to_report(valuediff) over (partition by snapid,inst),4) as pct
     from sunion
     order by 3,4 asc,7  desc
     )
     select * from (
     select to_char(begintime,'DD-MON-RR HH24:MI:SS') begintime
     ,to_char(endtime,'DD-MON-RR HH24:MI:SS') endtime,snapid,inst,stat,valuediff,round(pct*100,2) pct,
     row_number() over (partition by snapid,inst order by snapid,inst asc,pct desc) as rnum
     from spct
     )
     where rnum < 6 and snapid between &bsnap and &esnap
/

BEGINTIME            ENDTIME                  SNAPID  in STAT                               VALUEDIFF    PCT
-------------------- -------------------- ---------- --- ------------------------------ ------------- ------
16-NOV-11 21:00:12   16-NOV-11 21:20:16         5209   1 DB CPU                              28856557  83.24
16-NOV-11 21:00:12   16-NOV-11 21:20:16         5209   1 name-service call wait               2073440   5.98
16-NOV-11 21:00:12   16-NOV-11 21:20:16         5209   1 control file sequential read          843201   2.43
16-NOV-11 21:00:12   16-NOV-11 21:20:16         5209   1 enq: PS - contention                  634127   1.83
16-NOV-11 21:00:12   16-NOV-11 21:20:16         5209   1 undo segment extension                423219   1.22
16-NOV-11 21:00:12   16-NOV-11 21:20:16         5209   2 DB CPU                              34408715  89.73
16-NOV-11 21:00:12   16-NOV-11 21:20:16         5209   2 enq: PS - contention                  852207   2.22
16-NOV-11 21:00:12   16-NOV-11 21:20:16         5209   2 reliable message                      351905    .92
16-NOV-11 21:00:12   16-NOV-11 21:20:16         5209   2 control file sequential read          323355    .84
16-NOV-11 21:00:12   16-NOV-11 21:20:16         5209   2 row cache lock                        286882    .75

Physical and Logical I/O

select to_char(begintime,'DD-MON-RR HH24:MI') begintime,to_char(endtime,'DD-MON-RR HH24:MI') endtime
,(extract(Minute from endtime-begintime)*60+extract(Second from endtime-begintime)) secs,
snapid,inst,
prd-nvl(lag(prd) over (partition by inst order by inst,snapid),0) prddiff,
pwrt-nvl(lag(pwrt) over (partition by inst order by inst,snapid),0) pwrtdiff,
iordreq-nvl(lag(iordreq) over (partition by inst order by inst,snapid),0) iorddiff,
iowrtreq-nvl(lag(iowrtreq) over (partition by inst order by inst,snapid),0) iowrtdiff,
prmbr-nvl(lag(prmbr) over (partition by inst order by inst,snapid),0) prmbrdiff,
cgets-nvl(lag(cgets) over (partition by inst order by inst,snapid),0) cgetsdiff,
dbgets-nvl(lag(dbgets) over (partition by inst order by inst,snapid),0) dbgetsdiff
from
(
select sn.begin_interval_time begintime,
     sn.end_interval_time EndTime,ss.snap_id SnapId,ss.instance_number Inst,
     sum(decode(ss.stat_name,'physical read total bytes',value,0)) prd,
     sum(decode(ss.stat_name,'physical write total bytes',value,0)) pwrt,
     sum(decode(ss.stat_name,'physical read total IO requests',value,0)) iordreq,
     sum(decode(ss.stat_name,'physical write total IO requests',value,0)) iowrtreq,
     sum(decode(ss.stat_name,'physical read total multi block requests',value,0)) prmbr,
     sum(decode(ss.stat_name,'consistent gets',value,0)) cgets,
     sum(decode(ss.stat_name,'db block gets',value,0)) dbgets
from dba_hist_sysstat ss,dba_hist_snapshot sn
where ss.dbid=&dbid
and   ss.dbid = sn.dbid
and   ss.instance_number =  sn.instance_number
and   ss.snap_id = sn.snap_id
and   ss.snap_id between &bsnap-1 and &esnap
and   ss.stat_name in (
'physical read total bytes',
'physical write total bytes',
'physical read total IO requests',
'physical write total IO requests',
'physical read total multi block requests',
'consistent gets',
'db block gets'
)
group by sn.begin_interval_time,sn.end_interval_time,ss.snap_id,ss.instance_number
order by ss.instance_number,ss.snap_id
)
where snapid between &bsnap and &esnap
order by 4,5
/

BEGINTIME            ENDTIME                    SECS     SNAPID  in          PRDDIFF         PWRTDIFF         IORDDIFF        IOWRTDIFF        PRMBRDIFF        CGETSDIFF       DBGETSDIFF
-------------------- -------------------- ---------- ---------- --- ---------------- ---------------- ---------------- ---------------- ---------------- ---------------- ----------------
16-NOV-11 20:40      16-NOV-11 21:00        1210.766       5208   1     309967486976     544388304896         17660147         36335142          1139579       4468788730       2328671039
16-NOV-11 20:40      16-NOV-11 21:00        1210.782       5208   2     165472269312     510302864896         16065810         34894618           144948       4168326257       2064355182
16-NOV-11 20:42      16-NOV-11 21:00        1079.081       5208   3       3122675712        153791488            56249             6006            10713          1307859            60272
16-NOV-11 20:43      16-NOV-11 21:00        1028.123       5208   4       1318486016         59018752            26009             5436             7229          1199275            56027

Top 10 sql statements By Elapsed Time

select * from (
select ss.snap_id snapid,ss.instance_number inst,ss.sql_id  sqlid
       ,round(sum(ss.elapsed_time_delta)) elapsed
       ,nvl(round(sum(ss.executions_delta)),1) execs
       ,round(sum(ss.buffer_gets_delta)) gets
       ,round(sum(ss.rows_processed_delta)) rowsp
       ,round(sum(ss.disk_reads_delta)) reads
       ,dense_rank() over(partition by snap_id,instance_number order by sum(ss.elapsed_time_delta) desc) sql_rank
from
dba_hist_sqlstat ss
where
ss.dbid = &dbid  and
ss.snap_id between &bsnap and &esnap
group by ss.snap_id,ss.instance_number,ss.sql_id
)
where sql_rank < 11 and snapid between &bsnap and &esnap
/

   SNAPID  in SQLID                                ELAPSED      EXECS           GETS          ROWSP          READS   SQL_RANK
---------- --- ------------------------------ ------------- ---------- -------------- -------------- -------------- ----------
      5209   1 1zr2ujm5sa5gc                        4547685          1         224434              1              0          1
      5209   1 3g2ugdp1af2h8                        4478848         13         221134             13              0          2
      5209   1 5k5v1ah25fb2c                        3965629         39         360232             39              0          3
      5209   1 0r1zf55mxaujd                        2959144          1         237437              1              0          4
      5209   1 8vwv6hx92ymmm                        2633353      51062         360669          51062              0          5
      5209   1 2w1s3wu1zw63a                        1958993          2         153110              4              0          6
      5209   1 f1y8kbhh6v9sv                        1709282          1                                                       7
      5209   1 6pw8uk8k0dv0q                        1699402          1                                                       8

Using Python 3

I have been writing some python scripts for awr analysis and trending. Since python 2.7 is no longer being enhanced, i have now switched to using python 3. Lot of python applications and frameworks still does not support python 3 (Notably the Django framework). Good news is that cx_oracle works with python 3.

The steps to install cx_oracle with python 3 are very similar to the steps that i had outlined in my previous post on installing cx_oracle with python 2.7.

The difference is that

– You have to first install python3 and python3-dev (On ubuntu, you can just use the ubuntu software center to do this)

– Then download the cx_oracle 5.1.1 source code only tar ball from http://cx-oracle.sourceforge.net/

– login as root, untar the tar file, cd to the cx_Oracle-5.1.1 directory

– Then run /usr/bin/python3 setup.py install

That does it and now oracle connectivity is in place.

I’ve also been using the matplotlib library along with Python to plot graphs with the awr and oswatcher data files. matplotlib also works with python 3.

– You have to first install libpng, libpng-dev, libfreetype6, libfreetype6-dev (Use the ubuntu software center)

– Download the numpy source code tar ball.

– Extract the tar file, login as root, cd to the directory and run /usr/bin/python3 setup.py install

– Installing matplotlib Ref :

– Download the matplotlib source code tar file

– Login as root, cd to the directory

– /usr/bin/python3 setup.py build

– /usr/bin/python3 setup.py install

Now you should have matplotlib working with python3

Enjoy your python scripting


VirtualBox command line start, stop and clone

I find it convenient to install the oracle 11gr2 rdbms on Oracle Enterprise Linux 5, running as a virtual host using  VirtualBox  VM (That way i dont have to go through the pains of installing it on Ubuntu Linux, which is unsupported). I use this database on a daily basis for testing different oracle functionality.

I use the following virtualbox command line commands to start and stop the virtual machine. It is much faster that booting up the operating system and then starting oracle.

To stop the virtual machine(I have named the VM OEL7) , in its current state (With the oracle rdbms running)

$ VBoxManage  controlvm OEL7 savestate

To start it

$VBoxManage  startvm OEL7

It takes about 10 seconds to start it.

To make a clone of the Virtual Machine to a entirely new location. First shutdown the virtual machine and then

$VBoxManage clonevm OEL7 –options keepallmacs –name OEL7E –basefolder “/media/ExternalDrive/vimage2”

The –basefolder can be specified to be a new location on disk to create the new image.

HowTo create a resource plan using database resource manager

On any database server, there is often only a limited amount of resources (read cpu, memory, i/o). Oracle database servers are often subjected to highly concurrent, resource consuming, queries issued by database client programs. This often leads to contention for cpu, memory and i/o resources on a server. Once you have way more processes on a system actively executing, compared to the number of cpu cores you have, at some point the scheduling of these processes becomes inefficient at the operating system level.

The oracle Database resource manager can be used to throttle the Number of oracle database sessions actively running on a database, or to limit the amount of cpu resources a session gets, or to limit the degree of parallelism each session gets among other things.

Below are the steps to setup a resource plan that

Limits, maximum Degree of Parallelism each session can have, at 64.

Limits the maximum number of Concurrent active sessions that can be present for a user to 2. (The other sessions get queued).

Create a Consumer Group

Login to Enterprise Manager, click on the “Server” tab.

Under the “Resource Manager” section, click on “Consumer Groups”

Click on the “Create” Button

Enter a Name and Description for the Consumer group

Click on the “Add” button to add a database user to the consumer group

Check the box to the right of the user you want to add and click on “Select”

Now click on the “Database Instance” bread crumb to return to the “Server” tab

Click on “Consumer Group Mappings”

Click the radio button on the row that has the value “Oracle User” in the “View” column.

Click on “Add Rule for Selected Type”

In the drop down list “Selected consumer group”, choose “DOP_GROUP”.

In the “Available Oracle User” list, click on the user you want to add to this “DOP_GROUP” group and click on “Move”

Click Ok.

Click on Apply

Click on the “Database Instance” Breadcrump to return to the “Server Tab”

Click on “Resource Plans”

Click on “Create” to create a new resource plan

From the table “Available Group/Subplans” choose “DOP_GROUP” and click on Move. Click Ok.

Click on the tab named “Parallelism”

For the DOP_GROUP set the maximum degree of parallelism to be 64

Click on the tab named “Session Pool”

Set the number of active sessions to “2” (Or any number you want (In the screen i have used 12))

Click “Ok”

In the “Resource Plans” screen, choose the DOP_PLAN, In the actions drop down list, choose “Activate” and click on Go.

Now the resource plan “DOP_PLAN” is active.

Let us say now we submit a bunch of queries (Let us say 20) simultaneously (That use parallel query)as the database user HR then

Only 2 of them will be allowed to run at the same time (The rest are queued)

Each session gets a maximum Degree of parallelism of 64

What makes the oracle database machine V2 incredibly awesome ?

Lots of hardware power for the database servers

– 8 Real Application Clusters, Database server nodes (aka compute nodes) (Sunfire X4170’s)
– 2 quad core, Intel Xeon, E5540 processors (2.53ghz) in each server (Total 64 cores across 8 nodes)
– 72gb of RAM on each node (Total 576gb of Ram on the database servers)
– 3, 36 port QDR Infiniband switches (40gbit infiniband)

Lots of hardware power for the storage servers

– 14 Exadata cells, (Sunfire X4275’s).
– Each cell has
– 2 quad core Intel Xeon, E5540 processors on each cell (Total 112 cpu cores on all 14 cells together)
– 24Gb RAM on each cell (Total 336gb of Ram on all 14 cells together)
– 384Gb of Flash Cache (PCI-E flash card) on each cell (5Tb on all 14 cells together)
– 12x600gb (SAS) disks (7.2Tb) or 12x1Tb (SATA) disks (12Tb) (Total 100Tb with SAS disks, or 168Tb with SATA disks on all 14 cells together)
The above hardware gives the database machine the ability to read data at the rate of 21GigaBytes a second or 1,000,000 Io’s per second.

Balanced Configuration

Each disk drive (SAS) in the Sunfire X4275 server are, 3.5 inch, 15k Rpm, SAS 2.0, 600Gb drives. Each drive has an average read capacity of atleast 125MegaBytes per second for sequential scans. 168 disk drives, can together scan at the rate off (And return) 21,000 MegaBytes per second.
The Infiniband connections between the storage cells and the compute nodes have enough network bandwidth to transport data at the rate of 21GigaBytes per second.
The 64 cpu cores can issue i/o requests at the approx rate of 300megabytes per core, hence requesting about 21GigaBytes of data per second.
So the system is architect ed to optimally read (enough cpu’s to request the i/o, network bandwidth to ship the i/o and enough disk capacity to service that i/o requests). This is why the oracle database machine is a well balanced system.
At the read rate of 21GigaBytes a second, 1Terabyte of data can be read by the database servers in less than 1 minute.

Infiniband

Each database Node and Exadata Cell, has dual port, Quad data rate (QDR) Infiniband connectivity.
Infiniband is used for the database node to exadata cell connectivity and also for the Rac high speed interconnect (Cache fusion network)
Infiniband has the flexibility of a LAN network with the Speed of a SAN.
Oracle’s interconnect protocol uses DMA, to move data from the wire directly to the Memory without any additional copies made.

Exadata Cells

The exadata cells provide a highly redundant, high performance hardware with very intelligent software to efficiently process database i/o requests.
The hardware capability was discussed in the section “Lots of hardware power for the storage servers”.
The Intelligence in the Exadata cells include “Smart flash cache”, “Smart Scan’s”, “Storage Indexes” and “Exadata column compression”.
Oracle Automatic Storage Management (ASM) ensures that all oracle database files are evenly spread across all the 168 disks available in the database machine.
Oracle Database uses  the protocol iDB (Built by oracle, aptly called the Intelligent Database protocol) to communicate with the exadata cells. iDB is built on the Zero Data Loss, Zero Copy implementation (ZDP) of the industry standard protocol RDSv3 (Reliable datagram socket)

Flash cache

Random read operations are cached on the 5Tb of flash cache available in the database machine, significantly improving oltp performance.
The exadata smart flash cache, working with the database server, keeps track of data access patterns and intelligently manages the caching of blocks from the oracle datafiles.

Smart Scans

The oracle database server uses the iDB protocol to push query predicates (That limit data retrieved by the query using filters and join conditions) to the exadata cell.
This enables the cell to do three things
– Identify rows that are needed by the query and ship only those rows back to the database server (Not entire blocks)
– Identify the columns needed by the query and ship only the required columns in the rows back to the database server
– Use Bloom filters and process join conditions and ship only matching rows in queries with joins back to the database server
This drastically reduces the amount of data send back to the database server (Reducing the network usage)
Transferring file blocks from disks inside a cell to the physical memory of the cell can happen relatively fast. If a lot of the data that is not needed by the database server, can be eliminated at the cell, then the amount of data that needs to go over a network to the database server is significantly reduced. So the network i/o usage between the database servers and exadata cells is reduced by smart scans.

Storage Indexes

Smart scans however do not reduce the disk i/o within the cell (ie transfer from disk to cell physical memory).
Oracle creates an in memory array of structures, that keep track off min and max values of columns (columns used in the where clause that benefit by storage indexes), that let oracle identify if specific 1mb regions are needed based on the filtering conditions applied to the tables.
So storage indexes is a filter oracle applies to prune away 1mb chunks that do not have to be read.
This reduces the i/o within the exadata cell.

Exadata Hybrid Columnar Compression

Traditional relational databases store data in database blocks in a “Row” format. This type of storage limits the amount of compression that can be achieved
Column store databases, organizes and stores data by column. Storing column data together, with the same datatype and similar charachteristics , results in significant compression that can be achieved. However if the query refers to more than a couple of columns in the table, or does more than modest updates and inserts , those queries and dml tend to have slower performance.
Exadata hybrid columnar compression takes a blended approach. Oracle takes rows that fit into multiple blocks (Called a compression unit), converts the rows into columns and stores the data in a columnar format within a compression unit.
Oracle uses 3 different compression format’s and different transformations, depending on the compression level you have chosen
There are 4 levels of compression (Each level is a tradeoff between compression ratio and compression speeds)
– Query Low
– Query High (Default when you say compress for query)
– Archive Low
– Archive High
With Exadata Hybrid columnar compression
1) The amount of storage required to store massive amounts of data could be potentially decreased by a factor of 10.
2) The amount of i/o to be issued (For queries that scan very very large tables) is significantly reduced.
The above features together make’s the Oracle Database Machine, rock database performance.

Cloud computing definition.

The National Institute of Standards and technology has a good, concise  definition of cloud computing. Sushil kumar of Oracle, was using the same language to define cloud computing in an article the current release of the oracle magazine.

Essential Charachteristics

  • On Demand Self-Service
  • Broad Network Access
  • Resource Pooling
  • Rapid Elasticity
  • Measured Service

Service Models

  • Cloud Software as a Service (SaaS)
  • Cloud Platform as a Service (PaaS)
  • Cloud Infrastructure as a Service (IaaS)

Deployment Models

  • Private Cloud
  • Community Cloud
  • Public Cloud
  • Hybrid Cloud

Grid computing sessions at Oracle Openworld 2009

If you are attending Oracle Openworld 2009, and are interested in learning a lot about oracle Rac and Grid computing, you can find a full list of Oracle Rac and Grid computing events Here (Starting at page 2 of the pdf doc).

In preparation for the event, you could read the following new 11gR2 white papers from Oracle, to understand the latest developments and arm yourself with questions.

Oracle Real Application Clusters 11g Release 2 Technical Overview

Oracle Real Application Clusters 11g Release 2 Overview of SCAN

Oracle Real Application Clusters One Node 11g Release 2 Technical Overview