Monitoring Exadata database machine with Oracle Enterprise Manager 11g

Oracle Enterprise manager Grid control, is hands down the best monitoring and management tool, for the oracle exadata database machine. It comes with plugins to monitor all the hardware components of the database machine, and sensible, preset thresholds for proactive monitoring.

Update (Nov 2011) : Enterprise manager 12c is now available, and Certified to be used with exadata. The master MOS note 1110675.1 covers the installation and configuration details.

Some key points
  • You should use 11gR1 enterprise manager grid control for monitoring.
  • You should use 11gR1 enterprise manager agents, to monitor the targets on the database machine.
  • If you use enterprise wide monitoring tools like tivoli, openview or netcool, use snmp traps from oracle enterprise manager, to notify these monitoring tools (ie dont try to directly use snmp to monitor the exadata components. You could do this but it will be too time consuming).
  • You could potentially use 10.2.0.5 Oem, with 11g agents to monitor the dbmachine, but this is not recommended as a stable/long term solution.
  • The following components (And more) can be monitored using Enterprise Manager
    • Databases hosts
    • Exadata Cells
    • Cisco switch
    • KVM (Keyboard, Video, Mouse)
    • ILOM Monitoring
    • Infiniband switch
    • Power distribution unit (PDU)
You have 3 possible options to configure enterprise manager
  • If you have an existing 11gR1 enterprise manager grid control envrionment, you can patch it with the recommended patches and use that for monitoring the dbmachine targets.
  • You can setup and configure a brand new 11gR1 enterprise manager grid control environment (On a separate server) and configure it to monitor the dbmachine targets.
    • Download the required software
      • Weblogic server 10.3.2 (MOS Note 1106105.1, 1063112.1)
      • Jdk 64 bit (Mos Note 1063587.1)
      • 11gR1 Oms from download.oracle.com
    • Install Java and Web Logic Server (Wls)
      • MOS Note 1063762.1
    • Patch Web Logic Server
      • MOS Note 1072763.1
    • Install 11gR1 Enterprise manager Oracle Management Server (OMS)
      • Install/Create a 11gR2 database to serve as the Enterprise Manager Repository
      • Database pre-reqs for 11.1.0.1 repository (Mos Note 1064441.1)
      • Install/Configure Oms (Mos Notes 1130958.1, 1059516.1)
    • Patch OMS with the required patches to enable database machine monitoring
      • Mos Note 1323298.1
  • You can use an easy install option to setup and configure an enterprise manager environment and configure the plugins.
    • The easy install is delivered as a  patch 11852882 (EMGC setup automation kit)
    • The configuration worksheet has to be filled out properly (Before the installation) and the em.param file has to be generated.
    • Follow the instructions in the readme to do a quick install of a fully configured 11gR1 Enterprise manager installation.
    • This method helps you install/patch  and configure the full 11gR1 oms in just an few steps and is a huge time saver.
Download the required plugins to monitor the following components
Download the plugins from the enterprise manager extensions exchange
http://www.oracle.com/technetwork/database/exadata/index.html#plug-in (Exadata cell plugin)
http://www.oracle.com/technetwork/oem/grid-control/exadata-plug-in-bundle-188771.html (All the rest of the plugins)
Install and Configure the Agent and the Plugins
       Additional tutorials with screenshots on configuring the plugins can be found below
Sending SNMP traps to 3rd party monitoring tools.
  • Get the Mib (Management Information Base) file from your enterprise manager management server and send it to the 3rd party tool administrator (eg: openview or netcool). Follow MOS note 389585.1, to get this MIB file.
  • Then configure your notification methods and rules to send the required snmp traps to the 3rd party tool.

What makes the oracle database machine V2 incredibly awesome ?

Lots of hardware power for the database servers

– 8 Real Application Clusters, Database server nodes (aka compute nodes) (Sunfire X4170’s)
– 2 quad core, Intel Xeon, E5540 processors (2.53ghz) in each server (Total 64 cores across 8 nodes)
– 72gb of RAM on each node (Total 576gb of Ram on the database servers)
– 3, 36 port QDR Infiniband switches (40gbit infiniband)

Lots of hardware power for the storage servers

– 14 Exadata cells, (Sunfire X4275’s).
– Each cell has
– 2 quad core Intel Xeon, E5540 processors on each cell (Total 112 cpu cores on all 14 cells together)
– 24Gb RAM on each cell (Total 336gb of Ram on all 14 cells together)
– 384Gb of Flash Cache (PCI-E flash card) on each cell (5Tb on all 14 cells together)
– 12x600gb (SAS) disks (7.2Tb) or 12x1Tb (SATA) disks (12Tb) (Total 100Tb with SAS disks, or 168Tb with SATA disks on all 14 cells together)
The above hardware gives the database machine the ability to read data at the rate of 21GigaBytes a second or 1,000,000 Io’s per second.

Balanced Configuration

Each disk drive (SAS) in the Sunfire X4275 server are, 3.5 inch, 15k Rpm, SAS 2.0, 600Gb drives. Each drive has an average read capacity of atleast 125MegaBytes per second for sequential scans. 168 disk drives, can together scan at the rate off (And return) 21,000 MegaBytes per second.
The Infiniband connections between the storage cells and the compute nodes have enough network bandwidth to transport data at the rate of 21GigaBytes per second.
The 64 cpu cores can issue i/o requests at the approx rate of 300megabytes per core, hence requesting about 21GigaBytes of data per second.
So the system is architect ed to optimally read (enough cpu’s to request the i/o, network bandwidth to ship the i/o and enough disk capacity to service that i/o requests). This is why the oracle database machine is a well balanced system.
At the read rate of 21GigaBytes a second, 1Terabyte of data can be read by the database servers in less than 1 minute.

Infiniband

Each database Node and Exadata Cell, has dual port, Quad data rate (QDR) Infiniband connectivity.
Infiniband is used for the database node to exadata cell connectivity and also for the Rac high speed interconnect (Cache fusion network)
Infiniband has the flexibility of a LAN network with the Speed of a SAN.
Oracle’s interconnect protocol uses DMA, to move data from the wire directly to the Memory without any additional copies made.

Exadata Cells

The exadata cells provide a highly redundant, high performance hardware with very intelligent software to efficiently process database i/o requests.
The hardware capability was discussed in the section “Lots of hardware power for the storage servers”.
The Intelligence in the Exadata cells include “Smart flash cache”, “Smart Scan’s”, “Storage Indexes” and “Exadata column compression”.
Oracle Automatic Storage Management (ASM) ensures that all oracle database files are evenly spread across all the 168 disks available in the database machine.
Oracle Database uses  the protocol iDB (Built by oracle, aptly called the Intelligent Database protocol) to communicate with the exadata cells. iDB is built on the Zero Data Loss, Zero Copy implementation (ZDP) of the industry standard protocol RDSv3 (Reliable datagram socket)

Flash cache

Random read operations are cached on the 5Tb of flash cache available in the database machine, significantly improving oltp performance.
The exadata smart flash cache, working with the database server, keeps track of data access patterns and intelligently manages the caching of blocks from the oracle datafiles.

Smart Scans

The oracle database server uses the iDB protocol to push query predicates (That limit data retrieved by the query using filters and join conditions) to the exadata cell.
This enables the cell to do three things
– Identify rows that are needed by the query and ship only those rows back to the database server (Not entire blocks)
– Identify the columns needed by the query and ship only the required columns in the rows back to the database server
– Use Bloom filters and process join conditions and ship only matching rows in queries with joins back to the database server
This drastically reduces the amount of data send back to the database server (Reducing the network usage)
Transferring file blocks from disks inside a cell to the physical memory of the cell can happen relatively fast. If a lot of the data that is not needed by the database server, can be eliminated at the cell, then the amount of data that needs to go over a network to the database server is significantly reduced. So the network i/o usage between the database servers and exadata cells is reduced by smart scans.

Storage Indexes

Smart scans however do not reduce the disk i/o within the cell (ie transfer from disk to cell physical memory).
Oracle creates an in memory array of structures, that keep track off min and max values of columns (columns used in the where clause that benefit by storage indexes), that let oracle identify if specific 1mb regions are needed based on the filtering conditions applied to the tables.
So storage indexes is a filter oracle applies to prune away 1mb chunks that do not have to be read.
This reduces the i/o within the exadata cell.

Exadata Hybrid Columnar Compression

Traditional relational databases store data in database blocks in a “Row” format. This type of storage limits the amount of compression that can be achieved
Column store databases, organizes and stores data by column. Storing column data together, with the same datatype and similar charachteristics , results in significant compression that can be achieved. However if the query refers to more than a couple of columns in the table, or does more than modest updates and inserts , those queries and dml tend to have slower performance.
Exadata hybrid columnar compression takes a blended approach. Oracle takes rows that fit into multiple blocks (Called a compression unit), converts the rows into columns and stores the data in a columnar format within a compression unit.
Oracle uses 3 different compression format’s and different transformations, depending on the compression level you have chosen
There are 4 levels of compression (Each level is a tradeoff between compression ratio and compression speeds)
– Query Low
– Query High (Default when you say compress for query)
– Archive Low
– Archive High
With Exadata Hybrid columnar compression
1) The amount of storage required to store massive amounts of data could be potentially decreased by a factor of 10.
2) The amount of i/o to be issued (For queries that scan very very large tables) is significantly reduced.
The above features together make’s the Oracle Database Machine, rock database performance.