Category Archives: Scripting

Python script to query zillow

Here is a script that can be used (tested with python3 only) to query price information from zillow using the zillow API.

zillow.py

First you should register with zillow and get yourself a Zillow Web Services ID, following the steps outlined at https://www.zillow.com/howto/api/APIOverview.htm.

Then edit the script and substitute the string (On line 7) YourZillowWebServicesIdHere, with your own Zillow Web services id.

Then you can run the script as follows.

python3 zillow.py address zipcode (Both parameters are mandatory)

Eg : python3 zillow.py “4 zippy lane” 06194

Be sure to use a valid address and zipcode or else the script will error.

You will get an output similar to the output below. (Click on the image to see a larger version).

This is an easy way to checkout houses and prices of comparable houses in the same locality.

Hope someone finds this helpful.

 

Script to compare tkprof output files

I often use the oracle 10046 event tracing mechanism to capture sql’s from a session to identify why certain transactions are running slower in different env’s or at different points in time. Oracle does have a mechanism where you can save the trace information in database tables. One can use the INSERT parameter in tkprof to store the trace information into a database table. Once in the table you can write sql’s that compare multiple runs or multiple statements.

I wrote a python program that compares two different tkprof output files. The files are compared, and the following aspects of each of the sqlid’s in the tkprof output file,s are printed side by side. The output is sorted by the Difference in Elapsed Time, in Descending order.

  • Sql text
  • Plan Hash Value
  • Total Elapsed time
  • Total Logical Reads
  • Total Rows processed
  • Difference in Elapsed Time
  • Difference in Number of Rows processed
  • Difference in Logical reads.

Other columns can be added to this, if you desire.
I use this script output as a quick way to see which sql’s are running slower and are probably candidates for further analysis/tuning.

The sqlid’s from the file provided as the first argument to the script (referred to as the left file) are compared to the same sqlid’s in the file provided as the second argument to the script (referred to as the right file). The following columns are displayed.

sqlid                           sqlid being compared
text                             First 20 chars of the sql text
lplan                           Plan hash value from the left file
rplan                          Plan hash value from the right file
lela                             Total Elapsed time from the left file
rela                            Total Elapsed time from the right file
llreads                       Total Logical reads (query+current) from the left file
rlreads                      Total Logical reads (query+current) from the right file
lrows                         Total rows processed from the left file
rrows                        Total rows processed from the right file
eladiff                        Lela – Rela
rowsdiff                    Lrows – Rrows
lreadsdiff                  Llreads – rlreads

Here is a sample syntax for running the script. (You need the python pandas package to be installed for this to execute successfully)

python ./difftk.py /u01/tkprofout/Newplans.prf /u01/tkprofout/Stage.prf

Here is a sample output

difftk

 

Click on the image to view a larger version.

The full script is below

?View Code PYTHON
#Python script to list differences between sql executions in two tkprof output files
#useful if comparing tkprof from prod and dev for example
#Author : rajeev.ramdas
 
import sys
import os
import pandas as pd
from pandas import DataFrame
 
# Define a class to hold the counters for each sqlid
class sqliddet:
     def init(self):
            sqlid=''
            text=''
            plan_hash=''
            tcount=0
            tcpu=0
            tela=0
            tdisk=0
            tquery=0
            tcurr=0
            trows=0
 
# Define 2 empty dictionaries to store info about each input file
leftsqliddict={}
rightsqliddict={}
 
# Process each file and add one row per sqlid to the dictionary
# We want to add the row to the dictionary only after the SQLID row and the total row has been read
# So the firstsqlid flag is used to make sure that we do not insert before total is read for the first row.
 
def processfile(p_file,p_sqliddict):
 
    myfile=open(p_file,"r")
    line=myfile.readline()
    firstsqlid=True
    linespastsqlid=99
    while line:
        linespastsqlid+=1
        line=myfile.readline()
        if line.startswith('SQL ID'):
            linespastsqlid=0
            if firstsqlid==True:
                firstsqlid=False
            else:
                p_sqliddict[currsqlid.sqlid]=[currsqlid.plan_hash,currsqlid.tcount,currsqlid.tcpu,currsqlid.tela,currsqlid.tdisk,currsqlid.tquery
                            ,currsqlid.tcurr,currsqlid.trows,currsqlid.text]
            currsqlid=sqliddet()
            currsqlid.sqlid=line.split()[2]
            currsqlid.plan_hash=line.split()[5]
        if linespastsqlid==2:
            currsqlid.text=line[0:20]
        if line.startswith('total'):
            a,currsqlid.tcount,currsqlid.tcpu,currsqlid.tela,currsqlid.tdisk,currsqlid.tquery,currsqlid.tcurr,currsqlid.trows=line.split()
        if line.startswith('OVERALL'):
            p_sqliddict[currsqlid.sqlid]=[currsqlid.plan_hash,currsqlid.tcount,currsqlid.tcpu,currsqlid.tela,currsqlid.tdisk,currsqlid.tquery
                       ,currsqlid.tcurr,currsqlid.trows,currsqlid.text]
        continue
    myfile.close()
 
# Main portion of script
if len(sys.argv) != 3:
   print('Syntax : python ./difftk.py tkprof1.out tkprof2.out')
   sys.exit()
 
if not os.path.isfile(sys.argv[1]) or not os.path.isfile(sys.argv[2]):
   print ("File Does not Exist")
   sys.exit()
 
processfile(sys.argv[1],leftsqliddict)
processfile(sys.argv[2],rightsqliddict)
 
t_difftk_lst=[]
 
# Match the sqlid's from the file on the left to the file on the right
# Gather up the statistics form both sides, insert into a list
# Transfer the list to a pandas dataframe, add some computed columns
 
for sqlid,stats in leftsqliddict.items():
    l_totlogical=int(stats[5])+int(stats[6])
    if sqlid in rightsqliddict:
       t_difftk_lst.append([sqlid,stats[8].rstrip(),stats[0],rightsqliddict[sqlid][0],float(stats[3])
                            ,float(rightsqliddict[sqlid][3]),float(l_totlogical),float(rightsqliddict[sqlid][5])+float(rightsqliddict[sqlid][6])
                            ,float(stats[7]),float(rightsqliddict[sqlid][7])
                           ])
    else:
       t_difftk_lst.append([sqlid,stats[8].rstrip(),stats[0],0,float(stats[3]),0
                            ,float(l_totlogical),0,float(stats[7]),0
                           ])
 
difftk_df=DataFrame(t_difftk_lst,columns=['sqlid','sqltext','lplan','rplan','lela','rela','llreads','rlreads','lrows','rrows'])
difftk_df['eladiff']=difftk_df['lela']-difftk_df['rela']
difftk_df['rowsdiff']=difftk_df['lrows']-difftk_df['rrows']
difftk_df['lreadsdiff']=difftk_df['llreads']-difftk_df['rlreads']
 
pd.set_option('display.width',1000)
print (difftk_df.sort(columns='eladiff',ascending=False))

Using Pandas for CSV Analysis

As Oracle Dba’s we often come across situations where we are handed CSV (Comma separated values) files, by our managers, or customers, as Raw data, based on which we need to do some work. The first task would be to analyze the file and come up with some summary satistics, so we can quantify the amount of work involved.

When faced with such circumstances, my favorite method is to use sqlloader to upload the file into a database table, and then run sql statements on the data to produce summary info. Most people would probably use excel, formulas, macros and pivot tables to achieve similar results.

In this blog post, i present an alternate method that i’ve been using recently, for csv file summarization.

Pandas is a library written for the Python language for data manipulation and analysis. In order to proceed, first install Python and then install the Python package named ‘pandas’. Pandas is a real good alternative to the R programming language.See my previous post on how to install python and pandas.

For the examples in this post, i am using a Csv file, that has NFL game, play by play statistics for 2014.

Start by invoking the python interactive interpreter.

 

     $ python3
Python 3.4.2 (default, Dec 18 2014, 14:18:16) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.

First import the following libraries that we are going to use.

import pandas as pd
import numpy as np

Read the csv file into a Pandas DataFrame

 
df=pd.DataFrame(pd.read_csv('pbp-2014.csv',header=0))

Check how many rows the dataframe has

df.size
2056275

List the columns in the DataFrame

 
list(df)
['GameId', 'GameDate', 'Quarter', 'Minute', 'Second', 'OffenseTeam', 'DefenseTeam', 'Down', 'ToGo', 'YardLine', 'Unnamed: 10', 'SeriesFirstDown', 'Unnamed: 12', 'NextScore', 'Description', 'TeamWin', 'Unnamed: 16', 'Unnamed: 17', 'SeasonYear', 'Yards', 'Formation', 'PlayType', 'IsRush', 'IsPass', 'IsIncomplete', 'IsTouchdown', 'PassType', 'IsSack', 'IsChallenge', 'IsChallengeReversed', 'Challenger', 'IsMeasurement', 'IsInterception', 'IsFumble', 'IsPenalty', 'IsTwoPointConversion', 'IsTwoPointConversionSuccessful', 'RushDirection', 'YardLineFixed', 'YardLineDirection', 'IsPenaltyAccepted', 'PenaltyTeam', 'IsNoPlay', 'PenaltyType', 'PenaltyYards']

Check how many games the dataset covers. The GameId column is a unique identifier that identifies each game. The nunique method returns the number of unique elements in that object.

 
df.GameId.nunique()
256

List details of all games played by the New England Patriots. This command shows how you can provide a filter condition. The filter specifies that all the rows, where the OffensiveTeam is NE or DefensiveTeam is NE be listed.

 
df[(df['OffenseTeam'] == 'NE') | (df['DefenseTeam'] == 'NE')]

Subset a specific set of columns

 
df[['GameId','PlayType','Yards']]
 
           GameId     PlayType  Yards
0      2014090400     KICK OFF      0
1      2014090400         RUSH      6
2      2014090400         RUSH      3
3      2014090400         RUSH     15
4      2014090400         RUSH      2
5      2014090400         PASS     -2
...
...

Display all Pass and Rush plays Executed by New England. Here we are using a and filter condition to limit the rows to those of New England and the PlayType is either a PASS or a RUSH.

 
df[['GameId','PlayType','Yards']][(df['OffenseTeam'] == 'NE') & (df['PlayType'].isin(['PASS','RUSH']))]   
           GameId PlayType  Yards
1092   2014090705     RUSH      2
1093   2014090705     PASS      4
1094   2014090705     PASS      0
1102   2014090705     PASS      8
1103   2014090705     PASS      8
1104   2014090705     RUSH      4

Display the Number of Plays, Total Yards Gained, and Average Yards gained per PASS and RUSH play, per game.

 
df[['GameId','PlayType','Yards']][(df['OffenseTeam'] == 'NE') & (df['PlayType'].isin(['PASS','RUSH']))].groupby(['GameId','PlayType']).agg({'Yards': [np.sum,np.mean],'GameId':[np.size]}) 
                    Yards           GameId
                      sum      mean   size
GameId     PlayType                       
2014090705 PASS       277  4.540984     61
           RUSH       109  5.190476     21
2014091404 PASS       209  8.038462     26
           RUSH       158  4.157895     38
2014092105 PASS       259  6.641026     39
           RUSH        91  2.935484     31
2014092900 PASS       307  9.903226     31
           RUSH        75  4.687500     16
2014100512 PASS       301  7.921053     38
           RUSH       223  5.309524     42
2014101201 PASS       407  9.465116     43
           RUSH        60  2.307692     26
2014101600 PASS       267  6.675000     40
           RUSH        65  4.062500     16
2014102605 PASS       376  9.400000     40
           RUSH       121  3.781250     32
2014110208 PASS       354  6.210526     57
           RUSH        66  2.640000     25
2014111611 PASS       267  8.343750     32
           RUSH       248  6.048780     41
2014112306 PASS       393  6.894737     57
           RUSH        90  4.285714     21
2014113010 PASS       245  6.621622     37
           RUSH        85  5.000000     17
2014120713 PASS       317  6.604167     48
           RUSH        80  3.333333     24
2014121408 PASS       287  7.972222     36
           RUSH        92  3.407407     27
2014122105 PASS       189  5.250000     36
           RUSH        78  4.333333     18
2014122807 PASS       188  5.222222     36
           RUSH        92  4.181818     22

From the above example’s you can see how easy it is to read a csv file, apply filters and summarize the data set using pandas.

Setting up a Python 2.7.6 Virtual Env for Python development

Python is an excellent language to learn, for DBA’s who want to automate all the repetitive tasks, they need to perform. Once you start using python it is likely that you want to setup multiple environments with different versions of Python and libraries, based on the project you are working on. Virtualenv is a tool to create isolated python environments.

Below are the steps that i followed, to install a brand new working Python 2.7.6 environment with the following packages.

           SQLAlchemy    - Object Relational Mapper and SQL Toolkit for Python
           numpy         - Fundamental package for scientific computing
           matplotlib    - Python 2D plotting library
           ipython       - Interactive Python Shell
           pandas        - Python Data Analysis Library 
           Flask         - An easy to use Python Lightweight Micro Framework

This installation is performed on Ubuntu Linux, and i have already installed the libsqlite3-dev package and the oracle instant client.

Install Python 2.7.6

Download Python-2.7.6.tgz from http://www.python.org/getit

Install python 2.7.6 to your directory of choice.

		tar -xvf Python-2.7.6.tgz
		cd Python-2.7.6/
 
		./configure --prefix=/u01/Rk/Apps/Python/Python276
		make
		make install

Now you have python 2.7.6 installed into the /u01/Rk/Apps/Python/Python276 directory. (You will have a python binary in /u01/Rk/Apps/Python/Python276/bin)

Download virtualenv

curl -O https://pypi.python.org/packages/source/v/virtualenv/virtualenv-1.11.1.tar.gz

Install virtualenv

tar -xzvf virtualenv-1.11.1.tar.gz
cd virtualenv-1.11.1/
/u01/Rk/Apps/Python/Python276/bin/python virtualenv.py /u01/Rk/Apps/Python/p276env1

Activate the virtualenv

           . /u01/Rk/Apps/Python/p276env1/bin/activate

Install the additional Python Modules you need

	   pip install SQLAlchemy
	   pip install numpy
	   pip install matplotlib
	   pip install ipython
	   pip install pyzmq
	   pip install tornado
	   pip install jinja2
	   pip install pandas
	   pip install Flask
	   pip install Flask-SQLAlchemy

Install cx_Oracle

Ensure that the oracle instant client is installed, and the environment variables ORACLE_HOME and LD_LIBRARY_PATH are setup correctly.

Download cx_Oracle (a python extension module that allows access to oracle.) source from from http://cx-oracle.sourceforge.net/

tar -xzvf cx_Oracle-5.1.2.tar.gz
cd  cx_Oracle-5.1.2/
python setup.py install

Setup an alias (In your .bash_profile) to simplify invoking the virtualenv every time you want to use it.

alias p276env1='. /u01/Rk/Apps/Python/p276env1/bin/activate'

Now, anytime you want to execute a python program in this environment, you can invoke the Linux command line and

p276env1
python

Graph CPU usage on exadata using oswatcher files

On the oracle database machine, oswatcher is installed during setup time, both on the database nodes and the exadata cells. This utility collects linux operating system level statistics, which comes in very handy when troubleshooting operating system level issues. The data is collected in text files. There is a Java based utility (OSWG) provided by oracle support to graph the contents of these files, however that utility does not work on the oswatcher files generated on exadata.

Here is a python script that can graph the cpu used from the mpstat information that oswatcher captures. It has been tested on new oswatcher files on an x3-2. You need to first install a python environment that has the “numpy” and “matplotlib” modules installed.

Install a Python Virtualenv.

If you create multiple applications using Python and end up using different versions, it is easier to maintain different virtualenv’s. You can create a python virtualenv as shown below (On ubuntu linux).

curl -O https://pypi.python.org/packages/source/v/virtualenv/virtualenv-1.9.1.tar.gz
tar -xzvf virtualenv-1.9.1.tar.gz
cd virtualenv-1.9.1
python virtualenv.py ../p273env2
. p273env2/bin/activate
pip install numpy
sudo apt-get install libfreetype6-dev
pip install matplotlib

Now that you have a python environment, with your required libraries, you can go ahead and execute the script as shown below.

The oswatcher files in /opt/oracle/oswatcher are .bz2 files and there will be one file per hour per day. Copy the mpstat .bz2 files into a directory and use bunzip2 to unzip them. In this example let us say that the directory name is /u01/oswatcher/mpstat/tmp

You can now run the script as shown below

python parseoswmp.py  /u01/oswatcher/mpstat/tmp
or
python parseoswmp.py  /u01/oswatcher/mpstat/tmp '06/14/2013 05:00:00 AM' '06/14/2013 07:00:00 AM'

The first command will graph the cpu usage for the entire time range in all those files and the second command graphs the cpu information for the date and time range you have specified.

It creates a file in the current directory, named oswmpstat.png, which has the graph.

You can find the full script here.

You can find a sample output graph here.

Plotting AWR database metrics using R

In a previous post i showed how you can connect from R to the oracle database using the R driver. In this post i will explain how we can run queries against the AWR history tables and gather data that can be plotted using ggplot.

When you install R on linux, like i outlined in the above post, you get an executable named Rscript. Rscript is a NonInteractive variant of the R command, so you can run a R batch file from the linux shell (Like running a bash shell script). I am using Rscript as the interpreter in my script (First line).

ggplot2 is a R library that can be used for plotting in R programs. There is native plotting capability in R and there is another library named lattice. ggplot2 is much more robust and is based on the grammar of graphics. You have to install ggplot2 (install.packages(“ggplot2”)) in R before you can use this.

#!/usr/bin/Rscript
library(ROracle)
library(ggplot2)

 

Process command line arguments. This script expects 3 commandline arguments. Copy each argument to a R variable.

args <- commandArgs(TRUE)
l_dbid <- as.double(args[1])
l_bsnap <- as.double(args[2])
l_esnap <- as.double(args[3])

Connect to Oracle

drv <- dbDriver(“Oracle”)
con <- dbConnect(drv,username=”system”,password=”manager”,dbname=”burl5vb1:1521/rk01″)

Popluate a data frame with the values you will need for bind variables in the query you will be submitting.

my.data = data.frame(dbid = l_dbid, bsnap =l_bsnap,esnap=l_esnap)

Prepare and Execute the query

res <- dbSendQuery(con,”select dhss.instance_number,dhss.snap_id,dhs.end_interval_time et,
round(sum(decode(dhss.metric_name,’User Transaction Per Sec’,dhss.average,0))) utps,
round(sum(decode(dhss.metric_name,’Average Active Sessions’,dhss.average,0))) aas,
round(sum(decode(dhss.metric_name,’Host CPU Utilization (%)’,dhss.average,0))) hcpu,
round(sum(decode(dhss.metric_name,’Buffer Cache Hit Ratio’,dhss.average,0))) bchr,
round(sum(decode(dhss.metric_name,’Logical Reads Per Sec’,dhss.average,0))) lr,
round(sum(decode(dhss.metric_name,’I/O Megabytes per Second’,dhss.average,0))) iombps,
round(sum(decode(dhss.metric_name,’I/O Requests per Second’,dhss.average,0))) iops,
round(sum(decode(dhss.metric_name,’Redo Generated Per Sec’,dhss.average,0))) rg,
round(sum(decode(dhss.metric_name,’Temp Space Used’,dhss.average,0))) ts,
round(sum(decode(dhss.metric_name,’Physical Write Total IO Requests Per Sec’,dhss.average,0))) pw,
round(sum(decode(dhss.metric_name,’Physical Read Total IO Requests Per Sec’,dhss.average,0))) pr
from dba_hist_sysmetric_summary dhss,dba_hist_snapshot dhs
where
dhss.dbid = :1
and dhss.snap_id between :2 and :3
and dhss.metric_name in (
‘User Transaction Per Sec’,
‘Average Active Sessions’,
‘Host CPU Utilization (%)’,
‘Buffer Cache Hit Ratio’,
‘Logical Reads Per Sec’,
‘I/O Megabytes per Second’,
‘I/O Requests per Second’,
‘Redo Generated Per Sec’,
‘Temp Space Used’,
‘Physical Write Total IO Requests Per Sec’,
‘Physical Read Total IO Requests Per Sec’)
and dhss.dbid = dhs.dbid
and dhs.instance_number=1
and dhss.snap_id = dhs.snap_id
group by dhss.instance_number,dhss.snap_id,dhs.end_interval_time
order by 1,2″,data=my.data
)

Fetch the rows, and disconnect from the db.

data <- fetch(res)
dbDisconnect(con)

Open a pdf file to save the graphs to.
Generate the graphs using ggplot.
print the graphs to the pdf file
Close the pdf file.

In the ggplot function call, ET and INSTANCE_NUMBER represent the End Snap Time and Instance Number columns output from the query, and AAS, UTPS, HCPU, PW and PR represent the AverageActiveSessions, UserTransactionPerSecond, HostCpu, PhysicalWrites and PhysicalReads columns from the query.

pdf(“plotstat.pdf”, onefile = TRUE)
p1<-ggplot(data,aes(strptime(ET,format=”%Y-%m-%d %H:%M:%S”),AAS,group=INSTANCE_NUMBER,color=INSTANCE_NUMBER))+geom_point()+geom_line()+ggtitle(“Average Active S
essions”)+labs(x=”Time of Day”,y=”Average Active Sessions”)
p2<-ggplot(data,aes(strptime(ET,format=”%Y-%m-%d %H:%M:%S”),UTPS,group=INSTANCE_NUMBER,color=INSTANCE_NUMBER))+geom_point()+geom_line()+ggtitle(“Transactions Pe
r Second”)+labs(x=”Time of Day”,y=”Transactions Per Second”)
p3<-ggplot(data,aes(strptime(ET,format=”%Y-%m-%d %H:%M:%S”),HCPU,group=INSTANCE_NUMBER,color=INSTANCE_NUMBER))+geom_point()+geom_line()+ggtitle(“CPU Usage”)+lab
s(x=”Time of Day”,y=”Cpu Usage”)
p4<-ggplot(data,aes(strptime(ET,format=”%Y-%m-%d %H:%M:%S”),PW,group=INSTANCE_NUMBER,color=INSTANCE_NUMBER))+geom_point()+geom_line()+ggtitle(“Physical Writes”)
+labs(x=”Time of Day”,y=”Phywical Writes”)
p5<-ggplot(data,aes(strptime(ET,format=”%Y-%m-%d %H:%M:%S”),PR,group=INSTANCE_NUMBER,color=INSTANCE_NUMBER))+geom_point()+geom_line()+ggtitle(“Physical Reads”)+
labs(x=”Time of Day”,y=”Physical Reads”)
print(p1)
print(p2)
print(p3)
print(p4)
print(p5)
dev.off()

You can run this script as follows from the Linux Command Line. The first argument is the dbid, the second argument is the begin snap id and the last argument is the end snap id.

./plotstat.R 220594996 5205 5217

You will then see a pdf document named plotstat.pdf in your directory that has 5 separate graphs in it.
Click on the link below to see a sample file. This is plotting awr data from a 4 node Oracle Rac Database.

plotstat

Click Here to download the whole script, plotstat.R

ggplot2 : Elegant Graphics for Data Analysis is a great book to learn about ggplot2.

Shell script to create a tar archive of oracle trace files.

Whenever you have an oracle database problem and Oracle support asks you to upload the related trace files, the best option is to use the oracle Incident Packaging service to create an archive file that has all the necessary info to be uploaded to oracle.

If you just want to upload all the .trc files generated in the diagnostics trace directory (including but not limited to pmon traces), you can use the following script to generate such an archive file.

The following script accepts

  • The directory name (The location of your trace files)
  • The backup destination directory (The directory where you want the archive to be created. Ensure you have enough space here)
  • The date of the trace files (DD-MON-YYYY)
  • The begin time (HH24MI)
  • The end time (HH24MI)

Then it finds all .trc files that falls in between those begin and end times for the date you specified, from the directory you specified and creates a tar.gz archive file in the destination directory you specified. It creates a directory named trcbakMonDD in your destination directory and places the file in that dir. You can download this file and upload it to oracle.

Usage Example :. /backtraces.sh /u01/11gr2/diag/rdbms/rk01/rk01/trace /tmp ’11-Sep-2012′ 1315 1340

The abov ecommand will backup all .trc files, from the directory  /u01/11gr2/diag/rdbms/rk01/rk01/trace, that have a timestamp between 13:15 and 13:40 on 11th Sep 2012 to a tar Archive in the directory /tmp

I have only tested it on Oracle Enterprise Linux 5. (It is likely that the syntax for the Tar and date commands might be different on different platforms)

Find the script code below

 

#!/bin/bash
#This script can be used to create a tar archive of trace files created in 
#The database diagnostics trace directory between a given time period
#Author : Rajeev Ramdas
 
if [ $# != 5 ]
then
   echo ./backtraces.sh tracefiledir backupdir DD-Mon-YYYY HH24MI HH24MI
   echo ./backtraces.sh /u01/Rk/Docs/11g/Scripts2 /tmp '09-Nov-2012' 0900 1332
   exit
fi
 
l_backup_base=$2
l_backdir=trcbak`date --date=${3} +%b%d`
l_backdest=${l_backup_base}/${l_backdir}
l_startdate=`date --date=${3} +%Y%m%d`
l_enddate=`date --date=${3} +%Y%m%d`
l_starttime="${l_startdate}${4}"
l_endtime="${l_enddate}${5}"
l_backfile="${l_backdest}/tracebak-${l_starttime}-${l_endtime}.tar.gz"
 
if [ ! -d ${1} ]
then
   echo Wrong Backup Dir
   exit 1
fi
 
if [ ! -d ${2} ]
then
   echo Wrong Backup Dest
   exit 1
fi
 
if [ -d ${l_backdest} ]
then
   echo Directory Exists
else
   mkdir ${l_backdest}
fi
 
if [ -f ${l_backfile} ]
then
   rm ${l_backfile}
fi
 
touch -t "$l_starttime" /tmp/tmpoldfile
touch -t "$l_endtime" /tmp/tmpnewfile
 
find $1 -type f -newer /tmp/tmpoldfile ! -newer /tmp/tmpnewfile -name '*.trc' |  xargs tar -czvf - | cat &gt; ${l_backfile}
 
echo Your backup file is ${l_backfile}

Using the R Language with an Oracle Database.

R Programming Language Connectivity to Oracle.

R is an open source programming language and software environment for statistical computing and graphics. It is a fully functional programming language, widely used by statisticians to perform data analysis. It can also be a neat tool for Oracle DBA’s to graph and analyse database performance metrics. If you intend to embark on developing a sizable R+Oracle project, i’d encourage you to use Oracle Enterprise R and/or the Oracle Advanced Analytics.

Below are the steps on how to install and configure the R language on Ubuntu Linux with connectivity to Oracle.

These steps assume that you have an already installed and running Oracle 11gR2 database.

The high level steps are as follows

1) Install the R programming language environment
2) Download and install the oracle instant client
3) Download and install the following R packages
– DBI
– ROracle
4) Start using R with Oracle.

Install the R programming language environment

Refer to the installation instructions at www.r-project.org for your platform.
If you are installing this on Ubuntu Linux (As I have on Ubuntu 12.10), open the “Ubuntu Software Center” and install the following packages.
– R-base
– R-base-dev

Download and install the oracle instant Client

As your regular o/s user, download and install (Installation is nothing other than unzipping the downloaded file) the oracle instant client.
Download The instant client for your o/s platform from http://www.oracle.com/technetwork/database/features/instant-client/index-097480.html.
You need to download
– Instant Client package – Basic
– Instant Client package – SDK

For the purpose of this installation, we are going to assume that the instant client has been installed into /u01/Rk/Apps/oracle/instantclient_11_2.

 Download and install the R packages

DBI

– Download DBI from http://cran.r-project.org/web/packages/DBI/index.html. (Download the package source)
– sudo su –
– cd <To the directory where DBI_0.2-5.tar.gz>

root# R CMD INSTALL DBI_0.2-5.tar.gz
* installing to library ‘/usr/local/lib/R/site-library’
* installing *source* package ‘DBI’ ...
** R
** inst
** preparing package for lazy loading
Creating a generic function for ‘summary’ from package ‘base’ in package ‘DBI’
** help
*** installing help indices
** building package indices
** installing vignettes
‘DBI.Rnw’
** testing if installed package can be loaded
 
* DONE (DBI)

ROracle

– Download the ROracle source from http://cran.r-project.org/web/packages/ROracle/index.html
– sudo su –
– cd

– Set the following environment variables

root# export OCI_LIB=/u01/Rk/Apps/oracle/instantclient_11_2
root# export LD_LIBRARY_PATH=/u01/Rk/Apps/oracle/instantclient_11_2:$LD_LIBRARY_PATH
root# R CMD INSTALL ROracle_1.1-5.tar.gz
* installing to library ‘/usr/local/lib/R/site-library’
* installing *source* package ‘ROracle’ ...
** package ‘ROracle’ successfully unpacked and MD5 sums checked
configure: creating ./config.status
config.status: creating src/Makevars
** libs
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG -I/u01/Rk/Apps/oracle/instantclient_11_2/sdk/include -fpic -O2 -pipe -g -c rodbi.c -o rodbi.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG -I/u01/Rk/Apps/oracle/instantclient_11_2/sdk/include -fpic -O2 -pipe -g -c rooci.c -o rooci.o
gcc -std=gnu99 -shared -o ROracle.so rodbi.o rooci.o -L/u01/Rk/Apps/oracle/instantclient_11_2 -lclntsh -L/usr/lib/R/lib -lR
installing to /usr/local/lib/R/site-library/ROracle/libs
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
 
* DONE (ROracle)

Using The R Language with Oracle

Now you are ready to Run your first R program, Run a query against the database, and plot the output on a graph.

Invoke the R language command line by typing in the following

$ R

From the R command line use the following commands. (The formatting is a bit messed up, click on “view code” to see the actual commands)

&gt; library(ROracle)
&gt; drv &lt;- dbDriver("Oracle")
&gt; con &lt;- dbConnect(drv,username="sh",password="sh",dbname="burl5vb1:1521/rk01")
&gt; res &lt;- dbSendQuery(con,"select time_id,sum(quantity_sold) from sales
+ where time_id &gt; to_date('20-DEC-2001','DD-MON-RR')
+ group by time_id")
&gt; data &lt;- fetch(res)
&gt; data
               TIME_ID SUM(QUANTITY_SOLD)
1  2001-12-20 23:00:00                473
2  2001-12-21 23:00:00                374
3  2001-12-22 23:00:00               1034
4  2001-12-23 23:00:00               1662
5  2001-12-24 23:00:00                470
6  2001-12-25 23:00:00                289
7  2001-12-26 23:00:00               1076
8  2001-12-27 23:00:00               1196
9  2001-12-28 23:00:00                232
10 2001-12-29 23:00:00                758
11 2001-12-30 23:00:00                786
 
&gt; plot(data)

You will see a plot like the one below

Happy R scripting.

If you want to learn the R Language, i would recommend the book  The Art of R programming.

 

Using Python 3

I have been writing some python scripts for awr analysis and trending. Since python 2.7 is no longer being enhanced, i have now switched to using python 3. Lot of python applications and frameworks still does not support python 3 (Notably the Django framework). Good news is that cx_oracle works with python 3.

The steps to install cx_oracle with python 3 are very similar to the steps that i had outlined in my previous post on installing cx_oracle with python 2.7.

The difference is that

– You have to first install python3 and python3-dev (On ubuntu, you can just use the ubuntu software center to do this)

– Then download the cx_oracle 5.1.1 source code only tar ball from http://cx-oracle.sourceforge.net/

– login as root, untar the tar file, cd to the cx_Oracle-5.1.1 directory

– Then run /usr/bin/python3 setup.py install

That does it and now oracle connectivity is in place.

I’ve also been using the matplotlib library along with Python to plot graphs with the awr and oswatcher data files. matplotlib also works with python 3.

– You have to first install libpng, libpng-dev, libfreetype6, libfreetype6-dev (Use the ubuntu software center)

– Download the numpy source code tar ball.

– Extract the tar file, login as root, cd to the directory and run /usr/bin/python3 setup.py install

– Installing matplotlib Ref :

– Download the matplotlib source code tar file

– Login as root, cd to the directory

– /usr/bin/python3 setup.py build

– /usr/bin/python3 setup.py install

Now you should have matplotlib working with python3

Enjoy your python scripting


Installing Ruby 1.9.2 And Rails 3.1.1 with Oracle 11.2.0.3 on Ubuntu 11.10 Oneiric

Here are the steps to install and configure Ruby on rails with oracle 11.2.0.3 on 32 bit Ubuntu 11.10 Oneiric.

First install the pacakges needed by oracle

sudo apt-get install x11-utils rpm ksh lsb-rpm libaio1
sudo ln -s /usr/include/i386-linux-gnu/sys /usr/include/sys

Download the oracle instant client

Download the following .zip files from the oracle instant client download site.

instantclient-basiclite-linux-11.2.0.3.0.zip
instantclient-sqlplus-linux-11.2.0.3.0.zip
instantclient-sdk-linux-11.2.0.3.0.zip

Install the oracle InstantClient

Create a directory /u01/11gr2

cd /u01/11gr2

unzip the above 3 .zip files into this directory

You will have a new subdirectory named instantclient_11_2

Create a softlink to libclntsh.so

cd /u01/11gr2/instantclient_11_2
ln -s libclntsh.so.11.1 libclntsh.so

 

Setup the Oracle environment

Add the following to your .bashrc file (And source the file, . ./.bashrc)

export ORACLE_HOME=/u01/11gr2/instantclient_11_2
export LD_LIBRARY_PATH=$ORACLE_HOME/lib:$LD_LIBRARY_PATH
export TNS_ADMIN=$ORACLE_HOME

 

Create the tnsnames.ora file in /u01/11gr2/instantclient_11_2

Add service name entry for your oracle database  to the tnsnames.ora

RK01 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = burl5vb1)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = rk01)
    )
  )

 

Install Ruby 1.9.2

sudo apt-get install curl
sudo apt-get install git-core
 
git config --global user.name "YourNameHere"
git config --global user.emailbash YourEmailHere
 
bash &lt;&lt; (curl -s https://rvm.beginrescueend.com/install/rvm)
 
sudo apt-get install build-essential bison openssl libreadline6 libreadline6-dev curl git-core zlib1g zlib1g-dev libssl-dev libyaml-dev libsqlite3-0 libsqlite3-dev sqlite3libxml2-dev libxslt-dev autoconf libc6-dev ncurses-dev
 
rvm install 1.9.2
rvm --default use 1.9.2

Install Rails 3.1.1

gem install rails

Installing and using the oracle driver.

You can include the download and install of the oracle-advanced driver and the oci8 driver in the Gemfile for your application.

So that when you do the bundle install, it will install those gems for you.

Example shown below.

rails new testora
cd testora

 

Add the following lines to your Gemfile (In the application base directory)

gem 'activerecord-oracle_enhanced-adapter', :git =&gt; 'git://github.com/rsim/oracle-enhanced.git'
gem 'ruby-oci8', '~&gt; 2.0.6'

Save and quit from Gemfile

Run the following command to install all the gems you need for the application

bundle install

Remove all the other entroes and add the database connection entry to your database.yml file (Under testora/config).

development:
  adapter: oracle_enhanced
  database: rk01
  username: scott
  password: tiger

Create your application and run it

rails generate scaffold purchase name:string cost:float
rake db:migrate
rails server

You can access the application from the following URL.
http://localhost:3000/purchases

 

Now you should be able to run your application.