Oracle Exadata Statistics in AWR report – Part 2 (Outliers)

This blog post is a continuation of the previous blog post titled , Oracle Exadata Statistics in AWR report – Part 1 (Basics). In this post we continue on to describe the performance details displayed in the section “Exadata Outlier Summary”.

Outlier Summary Cell Level

awex3-1

This section displays cells that have performance outliers. The Awr Views DBA_HIST_CELL_DISK_SUMMARY, and DBA_HIST_CELL_GLOBAL_SUMMARY contains samples for each cell, disk and flash card.
The individual sample values, the number of samples, the average, the square of the value are all stored. Using this data the mean and the standard deviation are calculated and the range is defined as the average + or – standard deviation. Cells that have values that are above the mean + standard deviation are displayed.

This section will help us identify cells that have performance metrics, which are outside of the standard operating norms of that cell.

Outlier Summary – Disk Level

awex3-2

This section displays Disks that have performance outliers. The Awr Views DBA_HIST_CELL_DISK_SUMMARY contains this info.The individual sample values, the number of samples, the average, the square of the value are all stored. Using this data the mean and the standard deviation are calculated and the range is defined as the average + or – standard deviation. Disks that have values that are above the mean + standard deviation are displayed.

This section will help us identify Disks (Flash or Hard disk) that have performance metrics, which are outside of the standard operating norms of that Disk.

Exadata OS IO Statistics – Outlier Cells

awex4-1

This section displays cells that have IO statistics that are outliers. Per Cells averages, Per Disk Mean, Standard Deviation, Range’s of the IOPS and IO MBPS information is displayed. Averages exceeding the maximum stated capacity of the disk or cell are shown in Dark red.

This section helps identify whether there are cells or disks that exceed their stated capacities.

Exadata OS IO Statistics – Outlier Disks

awex4-2

This section displays disks (Flash and Hard disk) that have IO statistics that are outliers. Per Disk Mean, Standard Deviation, Range’s of the IOPS, IO MBPS and Disk utilization percentage information is displayed. Averages exceeding the Normal Ranges are shown in Dark red.

This section helps identify whether there are disks that are outside of the standard operating norms of that disk.

Exadata OS IO Latency – Outlier Cells

awex4-3
This section displays cells (Flash and Hard disk) that have IO latencies that are outliers.
Aggregated Across all cells, the Mean, Standard Deviation, Range’s of Average Serice times and Average Wait Times are displayed.
If there are cells Averages that exceed the Normal Range, they are displayed as outliers.

This section helps us identify whether there are cells that have I/O latencies that are outside of the standard operating norms for cells in this system.

Exadata OS IO Latency – Outlier Disks

awex5-1
This section displays disks (Flash and Hard disk) that have IO latencies that are outliers.
Aggregated Across all cells, the Mean, Standard Deviation, Range’s of Average Serice times and Average Wait Times are displayed.
If there are disks whose Averages that exceed the Normal Range of the cells, they are displayed as outliers.

This section helps us identify whether there are disks that have I/O latencies that are outside of the standard operating norms for disks in this system.
Exadata OS CPU Statistics – Outlier Cells

awex5-2
This section displays cells that have Cpu utilization that are outliers.
Aggregated Across all cells, the Mean, Standard Deviation, Range’s of Cpu utilization is displayed.
If there are cells whose Average Cpu utilization that exceed the Normal Cpu utilization Range of the cells, they are displayed as outliers.

Oracle Exadata Statistics in AWR report – Part 1 (Basic Info)

Starting with Exadata storage server 12.1.2.1.0 , used in combination with Oracle Database release 12.1.0.2, there are new sections which have been added to the Oracle AWR (Automatic Worload repository) report, that displays statistics at the Exadata storage level.

This is a really valuable enhancement, which helps with drilling down from database level statistics to cell level statistics, to identify and analyze the workload profile.
You can click on the URL’s in the section “Exadata Configuration and Statistics” to access this part of the report.

There are a few AWR history tables that store this information.

DBA_HIST_CELL_CONFIG
DBA_HIST_CELL_CONFIG_DETAIL
DBA_HIST_CELL_DB
DBA_HIST_CELL_DISKTYPE
DBA_HIST_CELL_DISK_NAME
DBA_HIST_CELL_DISK_SUMMARY
DBA_HIST_CELL_GLOBAL
DBA_HIST_CELL_GLOBAL_SUMMARY
DBA_HIST_CELL_IOREASON
DBA_HIST_CELL_IOREASON_NAME
DBA_HIST_CELL_METRIC_DESC
DBA_HIST_CELL_NAME
DBA_HIST_CELL_OPEN_ALERTS

The description of these views can be found in the Exadata Storage Server Users Guide.

The section starts off by showing the cell configuration information. Then it displays the Kernel  and the Cell Image version’s.

awex1-1

This information comes from the awr view DBA_HIST_CELL_DISK_SUMMARY.

The next section titled “Exadata Storage Information” storage information shows the number of disks and flash cards in each cell and the entire rack.

awex1-2
The first row of the output shows the amount of flash cache in each cell, The size of the smart flash log, Number of hard disks in a cell, Number of flash cards in each cell, and the number of Grid Disks in each cell.
The second row shows the above columns aggregated for all cells in the rack.

The next section titled “Exadata Griddisks” shows the grid disk names, Number of Grid disks in each cell, the Grid Disk size and The type of Drive

awex1-3
The next section titled “Exadata Cell Disks” shows the Disk type, Size of the cell disk, Number of disks .

awex2-1
The next section “ASM disksgroups” shows the diskgroups used by this database.

awex2-2
It shows the diskgroup name,Total size of the diskgroup,Used space, Number of disks in the diskgroup and the redundancy type.

This is followed by a section “Exadata Server Health Report”, which has 3 sub sections Exadata Alerts Summary,Exadata Alerts Detail,Exadata Non-Online Disks which displays information regarding alerts on the cells and any offline disks.
The remaining sections of Exadata performance statistics in the AWR report, display a great deal of Exadata cell performance numbers.

Before we venture much into those sections, it is important to understand some cell level concepts and how they are captured in Awr.

At the cell level if you list the following attributes (On a x5-2 cell with HD drives)

list cell attributes maxpdiops,maxpdmbps,maxfdiops,maxfdmbps you get the following values

167 111 8929 343

These values are collected and stored in the confval column in DBA_HIST_CELL_CONFIG_DETAIL in an XML format.

These base values are used to calculate the maximum capacities of the cells and disks in the sections that follow.

Adaptive Dynamic Sampling – Oracle 12c – Some notes

Adaptive Query optimization, was a set of new capabilities, introduced in oracle 12c, to allow the optimizer to discover additional information regarding statistics and make run-time adjustments to execution plans to make them better. This is a major change in the optimizer behaviour from 11g.

I would recommend anyone who is planning an upgrade to 12c, that they make themselves familiar with the following white papers from oracle.

Understanding Optimizer Statistics with Oracle 12c

Best practices for gathering statistics with Oracle 12c

What to expect with the optimizer, with Oracle 12c.

In this article, i want to talk about some of the important concepts behind Dynamic Statistics, which is one of the components of Adaptive query optimizations.

In the section’s that are following, i show some commands to turn some of these features off. I want to be clear that I am not recommending that you turn anything off. I would prefer that customer’s adopt these new features,that are designed to improve the execution plans. Also keep in mind that the following are accurate (Afaik) on 12.1.0.2 as off the time of writing of this article, and are subject to change.

Dynamic statistics has two interesting effects, that DBA’s tend to notice initially.
– Ever so slightly, longer parse times for queries.
– Execution plan changes (Compared to what they had before upgrading) for the same query. (Sometimes unwelcome changes, especially for deployments that value stability more than performance gains).

Dynamic sampling was introduced by oracle in 9i Release 2 to improve the optimizer’s functioning. The amount of dynamic sampling done , and when it kicks in, is controlled by the parameter optimizer_dynamic_sampling. With 12c there is a new concept of Adaptive Dynamic sampling. Adaptive Dynamic Sampling is different from the pre-12c traditional dynamic sampling, in the following aspects.

– Adaptive Dynamic sampling could kick in even when optimizer_dynamic_sampling is set to 2.

– Especially for parallel queries on large tables.

– Adaptive Dynamic sampling kicks in when optimizer_dynamic_sampling is set to 11.
– Adaptive Dynamic sampling issues more queries than dynamic sampling used to do.

– The traditional dynamic sampling, used to issue, atmost, 1 query per table SQL.
– It is not uncommon to see 10’s of Adaptive Dynamic sampling queries being issued for a     single SQL. Multiple dynamic sampling queries for the same table.(It dpends on the volume of data, number of indexed columns, complexity of     the predicates etc).
– If you run a 10046 trace on the query, you will see a lot of additional queries in there that     have the DS_SVC hint in them, which are the queries issued by the Adaptive Dynamic             Sampling.

Setting OPTIMIZER_ADAPTIVE_FEATURES=FALSE does NOT turn off Adaptive Dynamic Sampling.

You can set Optimizer_Dynamic_Sampling = 0 to turn Adaptive Dynamic Sampling off. However this would be like throwing the baby out with the bath water. Setting Optimizer_Dynamic_Sampling=0 completely sets dynamic sampling off (Including the old style pre-12c dynamic sampling).

You can do an ALTER SESSION SET “_fix_control”=’7452863:0′; to turn just Adaptive Dynamic Sampling off, if you so desire.

Adaptive Dynamic Sampling also uses oracle Results Cache. If results cache is enabled in the database (usually by setting result_cache_max_size to a value > 0), then Adaptive dynamic sampling uses the results cache to store the results that it queries up. This is done so that, if there are multiple query parses that have to do dynamic sampling, and they are looking at the same data, the optimizer can just look that value up from the results cache (As opposed to having to query the tables again and again).

If you have a system, that has a lot of hard parses (Due to not using bind variables), you could pottentialy see latch free waits on “Results Cache: rc latch”. Please refer to Mos note 2002089.1, that suggest’s setting “_optimizer_ads_use_result_cache” = FALSE; to work around this. Keep in mind that setting this parameter does not prevent the optimizer from using Adaptive dynamic sampling. All it does it prevent the Adaptive dynamic sampling from using the results cache.

The following Mos note’s and a presentation from Trivadis, have a lot of great information in this regard.

Automatic Dynamic Statistics (Doc ID 2002108.1)

Different Level for Dynamic Statistics (Dynamic Sampling) used than the Level Specified (Doc ID 1102413.1)

High Waits for ‘cursor: pin S wait on X’ due to Dynamic Sampling Against Parallel Queries (Doc ID 2006145.1)

Adaptive Dynamic Sampling – Trivadis

AWR – Profiling Database I/O

Oracle Awr (Automatic Workload Repository) statistics, captures and stores fine grained information about file reads and writes (aka i/o), that the database performed, during the course of execution of, application generated database workloads. When analyzing the read and write patterns of the database, it helps a lot to understand what type of activity is generating the reads and writes. With this stored information we can get an indepth understanding of the distribution of random and sequential reads and writes.

I use this information for getting a better understanding of the I/O profile, for my Exadata sizing exercises.

This information can be used to understand clearly how much of the i/o is from Temp activity, Datafile reads and writes, Archivelog writes, log writes, and whether these are small or large reads and writes.

To the best of my understanding the small reads and writes are those < 128k and the large reads and writes are those > 128k.

This information is contained mainly in two awr Views.

Dba_Hist_Iostat_FileType
Dba_Hist_Iostat_Function

Dba_Hist_Iostat_FileType

This view displays the historical i/o statistics by file type. The main filetypes are the following

Archive Log
Archive Log Backup
Control File
Data File
Data File Backup
Data File Copy
Data File Incremental Backup
Data Pump Dump File
Flashback Log
Log File
Other
Temp File

Dba_Hist_Iostat_Function

This view displays the historical i/o statistics by i/o function. The main i/o functions are the following

Recovery
Buffer Cache Reads
Others
RMAN
Streams AQ
Smart Scan
Data Pump
XDB
Direct Writes
DBWR
LGWR
Direct Reads
Archive Manager
ARCH

From everything i have seen sofar, these reads and writes can be directly co-related to the “Physical read total IO requests” and “Physical write total IO requests” system level statistics.

I have written a script that displays information from the above mentioned views and gives a detailed breakdown of i/o gendrated from different aspects of the database activities.
In order to fit in the computer screen real estate, i have actually limited the columns the script displays (So it displays only the file types i am frequently interested in). Please feel free to take the script and modify it to add columns that you want to display.

The full version of the script  awrioftallpct-pub.sql can be found here.

The script accepts the following inputs
– A begin snap id for a snapid range you want to report for
– A End snap id for a snapid range you want to report for
– A Dbid for the database
– The snap interval in seconds (If you have a 30 minute interval input 1800 seconds)

A description of all the column names in the output, broken down by section, is provided in the header section of the script.

There are 6 sections to this script

1) Total Reads + Writes
2) Total Reads
3) Total Writes
4) Read write breakdown for datafiles
5) Data File – Direct Path v/s Buffered Read Write breakdown
6) Read write breakdown for tempfiles

1) Total Reads + Writes

This section displays the number of reads+writes by filetype, and a percentage of reads+writes for each file type, as a percentage of total reads+writes. The last column displays the total reads+writes for all file types. The column DTDP shows the i/o that bypasses flash cache by default and goes directly to spinning disk on Exadata (Temp+Archivelogs+Flashback Logs).

io1-rw

Click on the image to see a larger version

2) Total Reads

This section displays the number of reads by filetype, and a percentage of reads for each file type, as a percentage of total reads. The last column displays the total reads for all file types.

io1-r

Click on the image to see a larger version

3) Total Writes

This section displays the number of writes by filetype, and a percentage of writes for each file type, as a percentage of total writes. The last column displays the total writes for all file types.

io1-w

Click on the image to see a larger version

4) Read write breakdown for datafiles

This section displays the I/O information only pertaining to datafile i/o. It displays the small and large reads and writes and a percentage they constitute of the total reads+writes to datafiles, and a percentage they constitute of the total reads or writes to datafiles. It also displays the total small and large reads and writes and a percentage they constitute of the total reads+writes to datafiles.

io1-dfrw

Click on the image to see a larger version

5) Data File – Direct Path v/s Buffered Read Write breakdown

This section provides a breakdown of I/O by function (As opposed to i/o by filetype in the previous sections). The output shows columns that display the direct path small and large reads and writes, buffered small reads and writes, smart scan small and large reads and other small and large reads and writes.

io3-bf

Click on the image to see a larger version

6) Read write breakdown for tempfiles

This section displays the I/O information only pertaining to tempfile i/o. It displays the small and large reads and writes and a percentage they constitute of the total reads+writes to tempfiles, and a percentage they constitute of the total reads or writes to tempfiles. It also displays the total small and large reads and writes and a percentage they constitute of the total reads+writes to tempfiles.

io3-tf

Click on the image to see a larger version

The full version of the script  awrioftallpct-pub.sql can be found here.

Oracle Active DataGuard – Considerations for the Wide area Network

Oracle customers use Oracle Active Dataguard to create and maintain one or many standby databases that protect their mission critical primary databases from disaster. Typically, in such deployments, the primary databases and standby databases are in geographically separate locations connected via a WAN (Wide Area Network). Log Transport Services, transfers Large volumes of redo logs from the primary location to the standby, using Sql*Net.

We have to ensure that all the components from the source to target are setup correctly to ensure that the data transfer can be done with the best throughput possible. If sufficient network bandwidth is not available with reasonable latencies, then we will start seeing the log transfer and apply,lagging on the standby site (Which is oracle speak for, your primary and standby database is now out of sync from a data perspective).

One key point to keep in mind is that, lower the network round trip time (aka latency), higher your data transfer throughput. Higher the network round trip time (aka latency), lower your data transfer throughput. So it is very important to maintain low round trip times on your Wide area network.

To understand network data transfer throughput, It is important to understand the the concepts of Tcp Window Size and Bandwidth Delay Product (aka BDP).

Tcp Window size is the amount of bytes that can be transmitted without receiving an acknowledgement from the other side. Once Tcp Window size amount of bytes are send, the sender stops sending any more bytes and waits for an acknowledgement from the receiver.

Bandwidth delay product is calculated as the product of the network bandwidth and network round trip time. bdp=network bandwidth*round trip time. This is the amount of data that left the sender before the first acknowledgement was received by the sender. If the senders output bandwidth is stable, and the bandwidth is fully used, then the BDP calculates the number of packets in transit. If we set the Tcp Window size equal to the bandwidth delay product, then in theory we should be able to fully utilize the available bandwidth.

Setup the network

We have to start by setting up the networking components to support the desired/stated bandwidth. So if you have a WAN that is a 10GigE network, all the NIC’s (Network interface cards), ports, switches in the configuration should be configured to support 10GigE full Duplex settings. After setup, run the configuration display utilities and ensure that all these component levels the transfer speeds are set to be 10 GigE. Customers often run into trouble when Auto Negotiation causes some NIC’s to set the transfer speeds to 1GigE because of configuration mismatches.

Use tools like Iperf to test the transfer speeds that your network is capable of achieving.

One important aspect to keep in mind is that it is probable that the WAN is shared by other traffic (e-mail, data replication, san replication). This has two important implications that we should consider.

  • If there is a lot of bandwidth consumption by some of this miscellaneous traffic, round trip times could be degraded periodically on the network.
  • We should be careful in our calculations that we do not completely consume the entire bandwidth for redo transport. (This could impact other processes)
    • So it is important to figure out (Working with the network admins) what the bandwidth entitlements are for redo transport and base our calculations on those numbers.

Caclulate our BDP

Use the following formula to calculate our Bandwidth delay product (BDP)

(bandwidth/8)*rount trip time in seconds.

The network bandwidth is expressed in bits per second, so we divide by 8 to convert to bytes.
Round Trip Time is usually in milli seconds, so we divide by 1000 to convert to seconds.

So for example, if we have a 10Gbit network bandwidth and a 40ms round trip time

BDP=(10000000000/8)*(40/1000) = 50,000,000 bytes.

Setup Sql*Net Parameters

The current recommendations for Dataguard Redo transport are as follows.

Set the SDU size to 65535

  • We can set SDU on a per connection basis using the SDU parameter in the local naming configuration file (TNSNAMES.ORA) and the listener configuration file (LISTENER.ORA)
  • We can set the SDU for all Oracle Net connections with the profile parameter DEFAULT_SDU_SIZE in the SQLNET.ORA file.

Set TCP.NODELAY to YES

To preempt delays in buffer flushing in the TCP protocol stack, disable the TCP Nagle algorithm by setting TCP.NODELAY to YES in the SQLNET.ORA file on both the primary and standby systems.

Setup RECV_BUF_SIZE and SEND_BUF_SIZE

The current recommendation is to set the SEND_BUF_SIZE and RECV_BUF_SIZE parameters (Which are the send and receive socket buffer sizes for SQL*Net) to 3 Times the BDP.

As per the above example we would set them to 50,000,000*3 = 150,000,000

Setup Operating system Kernel Parameters

If you are using the Linux operating system make sure that the values for the following kernel parameters are setup to be higher than the values set for RECV_BUF_SIZE and SEND_BUF_SIZE.

net.core.rmem_max
net.core.wmem_max

Once we have configured the network, operating system and the sql*net, and we have redo transfer, we can perform further network monitoring to see how the network bandwidth is being utilized, and make appropriate adjustments.

Links to helpful Documents

Iperf

How to calcluate Tcp throughput for long distance links (blog)

Oracle Net Performance Tuning (Mos)

Setting Send and Receive Buffer Sizes (Mos)

Tuning Sql*Net peformance (Oracle Docs)

Configuring Oracle Dataguard (Oracle Docs)

Best Practices for Sync Data Transport (White Paper)

Exadata Database Machine Specifications – Quick Reference

With the different combinations of the Oracle Exadata database machines and storage expansion racks available to customers, it is hard for me to remember, each system’s specifications.

So i have created a JavaScript page, that can be used to quickly lookup the specifications of the current generation (x3-8 and x4-2) of Exadata database machines and storage expansion racks.

The page can be accessed from a web browser from your desktop,laptop, or favorite mobile device at http://dbastreet.com/exaspecs

You can narrow the results down by the exadata rack size, type of disk, and type of specification (cpu, storage or performance).

Here is a screenshot

exaspecs

Exadata X4-2 Whats New – A Technical Review

 

On Dec 11 2013 Oracle corporation announced the general availability of, the 5th generation of the Oracle Exadata Database Machine X4. This new version of the database machine introduces new hardware and software to accelerate performance, increase capacity, and improve efficiency and quality of service for database deployments.

In this blog post i’ll review in detail the enhancements in the Oracle Exadata x4-2 database machine.

The 2 socket version of the Database machine, X4-2 gets new database servers and new storage servers. The 8 socket version of the database machine, still called the x3-8, gets the new storage servers but keeps the same database serves from the previous release. Hence the Lack of change in Name to the X4-8. The name stays as X3-8.

Improvements in the database servers (X4-2)

Cpu Improvements

The database servers in the x4-2 are the Sun X4-2  servers. They come with the 2 Twelve-Core Intel® Xeon® E5-2697 v2 Processors (Ivy Bridge) (2.7GHz), with turbo boost enabled by default. This cpu at times can clock upto 3.5Ghz.

Comparing this with the previous version x3-2, this is a 50% increase in number of cores per database server. The x3-2 had 16 cores per node and the x4-2 has 24 cores per node. In a x3-2 full rack there were 128 cores and in the x4-2 full rack there are 192 cores. That gives 64 more cores in the x4-2 full rack compared to the x3-2 full rack.

Memory Improvements

Each database server has 256Gb of Dram. This is optionally expandable to 512Gb per node.So you can have either 2Tb or 4Tb of Ram in a full rack of x4-2.

Storage Improvements

The database servers now have 4, 600Gb hard disks in them (Internal storage, used for O/S, Oracle Binaries, Logs etc). The x3-2 used to have 300Gb disks. The disk controller batteries are online replaceable

Network Improvements

The database servers have 2 X Infiniband 4 X QDR (40Gb/S) Ports (PCIe 3.0). Both ports are Active. The improvements in the x4-2 are that these are now PCIe 3.0, and that the ports are used active active. (The ports were used Active/Passive in the previous release)

Improvements in the storage servers (Exadata cells x4-2)

Cpu Improvements

Each x4-2 exadata cells come with 2, Six-Core Intel® Xeon® E5-2630 v2 Processors (2.6 GHz).

Memory Improvements

Each x4-2 cell, now has 96Gb of Ram. The x3-2 used to have 64 Gb of Ram.

Flash Cache Improvements

The Exadata x4-2 cells, now have the F80 PCIe Flash cards. There are 4, 800Gb F80 flash cards in each cell. So each cell has 3.2Tb of flash cache. In an Oracle Exadata database machine x4-2 full rack there is 44TB of flash cache (Used to be 22Tb in the x3-2). The Oracle Exadata database machine x4-2 full rack now provides, 2.66Million read iops , 1.96Million write iops from the flash cache. This is 70% more flash iops than the previous generation.

Storage Improvements

The Exadata cell x4-2 High Performance version, now uses 1.2Tb 10k Rpm High Performance SAS disks.

  • A full rack of High Performance disks gives 200TB of raw space, 90TB of usable space with normal mirroring and 60TB of usable space with High Mirroring

The Exadata cell x4-2 High Capacity version, now uses 4Tb 7.2k Rpm High Capacity SAS disks.

  • A full rack of High Capacity disks gives 672TB of raw space, 300TB of usable space with normal mirroring and 200TB of usable space with High Mirroring

The Disk controller batteries are online replacable.

Network Improvements

The Exadata storage servers have 2 X Infiniband 4 X QDR (40Gb/S) Ports (PCIe 3.0). Both ports are Active. The improvements in the x4-2 are that these are now PCIe 3.0, and that the ports are used active active. (The ports were used Active/Passive in the previous release)

 

Software Improvement Highlights

The new x4-2’s get a new version of the exadata storage server software version 11.2.3.3.0 (This version of the cell software can be installed on the v2,x2,x3 and x4 platforms).

Exadata Flash Cache Improvements

Data written to the flash cache is now compressed. This compression is done by the controller (Hence near Zero overhead). This data is automatically decompressed when it is read from the flash cache. Up to 2X more data fits in smart flash cache (If you get 2x compression of the data, you could fit upto 80TB of data on the flash cache), so flash hit rates will improve and performance will improve for large data sets. This feature can be turned on in both x3-2 and x4-2 storage servers. Customers need to license the Advanced Compression option to use this new feature.

Enhancements to the smart flash cache software, enables exadata software to understand database table and partition scans and automatically caches them when it makes sense (This will help eliminate the need to specify CELL FLASH CACHE KEEP).

Exadata network resource management

With a new version of the Infiniband switch firmware 2.1.3-4, Exadata network resource management now prioritizes messages through the entire infiniband fabric. Latency sensitive messages like redo log writes are prioritized over batch, reporting and backup messages.

Infiniband Active Active Ports

Double-ported infiniband PCIe-3.0 Cards used in the database servers and storage servers, implement active-active mode usage of the infiniband ports, providing a 80GigaBits Per Second network bandwidth (Used to be 40Gigbits ber second on the x3-2, since it was Active Passive bonding).

The Rdbms software and the clusterware software already had the ability to send packets via multipe interfaces. Enhancements have been done to the RDS kernel drivers, which now have the ability to sense if one of the ports is down, and route the network traffic through the surviving port. On the x4-2’s when active active Infiniband networking is setup you will not see the bondib0 interface, instead you will see a ib0 and ib1.

Miscellaneious Info

  • The power,cooling and airflow requirements remain similar to that of the x3-2.
  • The storage expansion racks have also been refreshed and provides the increased flash cache and increased disk space.
  • A single Database Machine configuration can have servers and storage from different generations V2, X2, X3, X4.
  • Databases and Clusters can span across multiple hardware generations.
  • The half and full x4-2 racks, do not ship the spine switch anymore. The storage expansion racks still ship with the spine switch.
  • A new One Command available through patch 17784784 has the new Exadata Deployment assistant that supports the x4-2’s.

Datasheets and White Papers

Datasheet: Oracle Exadata database machine x4-2

Datasheet: Oracle Exadata database machine x3-8

Datasheet: Oracle Exadata storage expansion Rack x4-2

Datasheet: Oracle Exadata storage server x4-2

A Technical Overview of the Oracle Database Machine and Exadata Storage Server (Updated for x4-2)

Exadata Smart Flash Cache (Updated For X4-2)

 

Managing Processes and Memory Consumption on Oracle Exadata database machines.

This post is equally applicable to Rac databases deployed, both on oracle database machines as well as traditional server/storage architectures, running RHEL5 or OEL5 (OEL6 has a new feature called transparent hugepages).

The Best practices for database consolidation on exadata database machine, White Paper outlines a lot of good guidelines. However the best practices seems to be getting occasionally overlooked, in some cases resulting in Node evictions/restarts in the cluster.

High amounts of memory and cpu consumption can result in some real bad things happening (Like for eg: Node evictions). So it is important to configure the operating system and the databases you deploy on the machine optimally, to use the available CPU and memory resources.

Let us first review what those available memory and cpu resources are on an Oracle Exadata Database Machine, on each database node.

[table “1” not found /]

Anytime a workload is deployed on a database server, that exceeds the, operating systems ability to efficiently use the above mentioned, available cpu and memory resources, bad things can happen. (I know i am over simplifying in the prior comment, but you get the idea).

Managing Memory Resources.

Setting up HugePages

First we need to make sure that HugePages on each node, is setup correctly, for the databases you have deployed on the node.
Tanel Poder has explained the reasons for setting up HugePages on Exadata systems.
When the database machines are installed, the HugePages is configured to accomodate the one database that is pre-installed. So if you change the SGA setting for that database or deploy more databases you should now adjust the HugePages setting accordingly. It goes without saying that if you have already configured the hugepages to accomodate all the databases you have, and then you removed some databases from the node, or resized the SGA’s to be smaller, you should again resize your hugepages to free up the excess hugepages.

Once all the database instances are configured and started up you can run the script, hugepages_settings.sh from Mos Note 401749.1, which calculates the number of hugepages you need to setup. I usually recommend adding 10% to the value that hugepages_settings.sh suggests.

With 11.2.0.2 and 11.2.0.3 you should also be setting USE_LARGE_PAGES, preferably to ONLY, following the instructions in Mos Note 1392497.1.

You can then follow the instructions in MOS note 361323.1, under the section “Kernel version 2.6”, to set the value. You need to restart the databases so the SGA gets allocated with the hugepages.

You should also follow instructions from MOS note 1546861.1, and set the value of vm.min_free_kbytes = 524288, in /etc/sysctl.conf, to avoid page allocation failure messages when kernel memory is depleted.

Managing Memory Usage

Ideally for critical implementations, your SGA+PGA+Individual server processes, memory allocations should not exceed 75% of the physical memory on the database node.

The Exadata consolidation white paper above suggests that

For critical Hardware Pools, we recommend an even more conservative approach by not exceeding 75% physical memory per database node.
OLTP applications:
SUM of databases (SGA_TARGET +PGA_AGGREGATE_TARGET) + 4 MB * (Maximum PROCESSES) < Physical Memory per Database Node
DW/BI applications:
SUM of databases (SGA_TARGET + 3 * PGA_AGGREGATE_TARGET) < Physical Memory per Database Node

You can monitor the pga usage a few different ways.

1) SELECT s.inst_id, s.sid, s.serial#, p.spid, s.username, s.program, p.pga_used_mem FROM gv$session s JOIN gv$process p ON p.addr = s.paddr AND p.inst_id = s.inst_id WHERE s.type != ‘BACKGROUND’ and s.program not like ‘%(P%’ and p.pga_used_mem > <APPLICATION_MEMORY_THRESHOLD>
order by s.inst_id, s.sid, s.serial#;

The value you use for APPLICATION_MEMORY_THRESHOLD is dependent on your application and howmuch pga it needs to efficiently run your queries (Performing the sorts in memory using the space for PGA is usually faster than using the TEMP tablespace). However the white paper suggests using 1Gb for OLTP applications and 10Gb for DSS applications.

2) You can monitor the sysmetric “Total PGA Allocated” from dba_hist_sysmetric_summary, to see how much PGA is being allocated.

3) You can use the “PGA Memory Advisory” and “SGA Target Advisory” sections of the awr report, to guage, the impacts of increasing or decreasing the SGA and PGA Allocations.

4) You can monitor the memory usage at the o/s level using the top (Mem: free),free or vmstat (bo (page-out),bi (page-in)) commands.

Managing O/S Processes (And as a consequence CPU usage).

It is important to control the number of processes that are actively executing on each node. Each of these processes consume cpu and memory resources.

The following guidelines are important to follow.
– Use Application connection pooling to appropriately limit the number of sessions connected to the oracle database.

This is important for two reasons.
– Right-Size the resource consumption on each database node.
– Minimize the impact of connection storms.

– Configure the parallel query servers appropriately.
– For all the database’s deployed on a node the total setting for PARALLEL_MAX_SERVERS should be less than or equal to the following

X2-2 or X3-2, <= 240
X2-8 or X3-8, <= 1280

I often see implimentations where parallel_max_servers = 240 (or a high value) in each instance on the machine, which results in not so good consequences. I have occassionaly seen systems where there is sustained 100% cpu usage with load averages of > 300, resulting in node reboots.

– Use Instance Caging and Database Resource Manager to manage cpu allocation on the server.

It would be a good idea to always review the Best practices for database consolidation on exadata database machine, White Paper and follow the guidelines, to have a well performing and  stable database environment.

It would be a good idea to review the performance management screens from Enterprise Manager 12c every day to have a good understanding of the database and server workloads, and/or analyze the oswatcher data to understand the server workload profile.

Graph CPU usage on exadata using oswatcher files

On the oracle database machine, oswatcher is installed during setup time, both on the database nodes and the exadata cells. This utility collects linux operating system level statistics, which comes in very handy when troubleshooting operating system level issues. The data is collected in text files. There is a Java based utility (OSWG) provided by oracle support to graph the contents of these files, however that utility does not work on the oswatcher files generated on exadata.

Here is a python script that can graph the cpu used from the mpstat information that oswatcher captures. It has been tested on new oswatcher files on an x3-2. You need to first install a python environment that has the “numpy” and “matplotlib” modules installed.

Install a Python Virtualenv.

If you create multiple applications using Python and end up using different versions, it is easier to maintain different virtualenv’s. You can create a python virtualenv as shown below (On ubuntu linux).

curl -O https://pypi.python.org/packages/source/v/virtualenv/virtualenv-1.9.1.tar.gz
tar -xzvf virtualenv-1.9.1.tar.gz
cd virtualenv-1.9.1
python virtualenv.py ../p273env2
. p273env2/bin/activate
pip install numpy
sudo apt-get install libfreetype6-dev
pip install matplotlib

Now that you have a python environment, with your required libraries, you can go ahead and execute the script as shown below.

The oswatcher files in /opt/oracle/oswatcher are .bz2 files and there will be one file per hour per day. Copy the mpstat .bz2 files into a directory and use bunzip2 to unzip them. In this example let us say that the directory name is /u01/oswatcher/mpstat/tmp

You can now run the script as shown below

python parseoswmp.py  /u01/oswatcher/mpstat/tmp
or
python parseoswmp.py  /u01/oswatcher/mpstat/tmp '06/14/2013 05:00:00 AM' '06/14/2013 07:00:00 AM'

The first command will graph the cpu usage for the entire time range in all those files and the second command graphs the cpu information for the date and time range you have specified.

It creates a file in the current directory, named oswmpstat.png, which has the graph.

You can find the full script here.

You can find a sample output graph here.

Exadata Deployment Assistant

Previously, for an Oracle database machine installation, customers had to fill out the configuration worksheet, with information regarding the hostnames, ip addresses, how they want the machine configured etc. The file generated from the configuration worksheet served as the input to OneCommand.

Now there is a new utility called the “Exadata Deployment Assistant”. This is a java based, wizard driven configuration file generator, which replaces the configuration worksheet. You can get the utility by downloading the latest OneCommand Patch. You can find the latest OneCommand patch by referring to the Onecommand section of Mos Note 888828.1.

You can also find the latest Oracle Exadata Deployment Assistant at http://www.oracle.com/technetwork/database/exadata/oeda-download-2076737.html

Download and unzip the patch.

cd linux-x64

./config.sh

The details of the command, and the inputs it looks for are in the chapter titled “Using Oracle Exadata Deployment Assistant”, in the latest “Exadata database machine Owner’s guide”.