Script to compare tkprof output files

I often use the oracle 10046 event tracing mechanism to capture sql’s from a session to identify why certain transactions are running slower in different env’s or at different points in time. Oracle does have a mechanism where you can save the trace information in database tables. One can use the INSERT parameter in tkprof to store the trace information into a database table. Once in the table you can write sql’s that compare multiple runs or multiple statements.

I wrote a python program that compares two different tkprof output files. The files are compared, and the following aspects of each of the sqlid’s in the tkprof output file,s are printed side by side. The output is sorted by the Difference in Elapsed Time, in Descending order.

Sql text
Plan Hash Value
Total Elapsed time
Total Logical Reads
Total Rows processed
Difference in Elapsed Time
Difference in Number of Rows processed
Difference in Logical reads.

Other columns can be added to this, if you desire.
I use this script output as a quick way to see which sql’s are running slower and are probably candidates for further analysis/tuning.

The sqlid’s from the file provided as the first argument to the script (referred to as the left file) are compared to the same sqlid’s in the file provided as the second argument to the script (referred to as the right file). The following columns are displayed.

sqlid sqlid being compared
text First 20 chars of the sql text
lplan Plan hash value from the left file
rplan Plan hash value from the right file
lela Total Elapsed time from the left file
rela Total Elapsed time from the right file
llreads Total Logical reads (query+current) from the left file
rlreads Total Logical reads (query+current) from the right file
lrows Total rows processed from the left file
rrows Total rows processed from the right file
eladiff Lela – Rela
rowsdiff Lrows – Rrows
lreadsdiff Llreads – rlreads

Here is a sample syntax for running the script. (You need the python pandas package to be installed for this to execute successfully)

python ./difftk.py /u01/tkprofout/Newplans.prf /u01/tkprofout/Stage.prf

Here is a sample output

Click on the image to view a larger version.

The full script is below

#Python script to list differences between sql executions in two tkprof output files
#useful if comparing tkprof from prod and dev for example
#Author : rajeev.ramdas

import sys
import os
import pandas as pd
from pandas import DataFrame

# Define a class to hold the counters for each sqlid
class sqliddet:
     def init(self):
            sqlid=''
            text=''
            plan_hash=''
            tcount=0
            tcpu=0
            tela=0
            tdisk=0
            tquery=0
            tcurr=0
            trows=0

# Define 2 empty dictionaries to store info about each input file
leftsqliddict={}
rightsqliddict={}

# Process each file and add one row per sqlid to the dictionary
# We want to add the row to the dictionary only after the SQLID row and the total row has been read
# So the firstsqlid flag is used to make sure that we do not insert before total is read for the first row.

def processfile(p_file,p_sqliddict):

    myfile=open(p_file,"r")
    line=myfile.readline()
    firstsqlid=True
    linespastsqlid=99
    while line:
        linespastsqlid+=1
        line=myfile.readline()
        if line.startswith('SQL ID'):
            linespastsqlid=0
            if firstsqlid==True:
                firstsqlid=False
            else:
                p_sqliddict[currsqlid.sqlid]=[currsqlid.plan_hash,currsqlid.tcount,currsqlid.tcpu,currsqlid.tela,currsqlid.tdisk,currsqlid.tquery
                            ,currsqlid.tcurr,currsqlid.trows,currsqlid.text]
            currsqlid=sqliddet()
            currsqlid.sqlid=line.split()[2]
            currsqlid.plan_hash=line.split()[5]
        if linespastsqlid==2:
            currsqlid.text=line[0:20]
        if line.startswith('total'):
            a,currsqlid.tcount,currsqlid.tcpu,currsqlid.tela,currsqlid.tdisk,currsqlid.tquery,currsqlid.tcurr,currsqlid.trows=line.split()
        if line.startswith('OVERALL'):
            p_sqliddict[currsqlid.sqlid]=[currsqlid.plan_hash,currsqlid.tcount,currsqlid.tcpu,currsqlid.tela,currsqlid.tdisk,currsqlid.tquery
                       ,currsqlid.tcurr,currsqlid.trows,currsqlid.text]
        continue
    myfile.close()

# Main portion of script
if len(sys.argv) != 3:
   print('Syntax : python ./difftk.py tkprof1.out tkprof2.out')
   sys.exit()

if not os.path.isfile(sys.argv[1]) or not os.path.isfile(sys.argv[2]):
   print ("File Does not Exist")
   sys.exit()

processfile(sys.argv[1],leftsqliddict)
processfile(sys.argv[2],rightsqliddict)

t_difftk_lst=[]

# Match the sqlid's from the file on the left to the file on the right
# Gather up the statistics form both sides, insert into a list
# Transfer the list to a pandas dataframe, add some computed columns

for sqlid,stats in leftsqliddict.items():
    l_totlogical=int(stats[5])+int(stats[6])
    if sqlid in rightsqliddict:
       t_difftk_lst.append([sqlid,stats[8].rstrip(),stats[0],rightsqliddict[sqlid][0],float(stats[3])
                            ,float(rightsqliddict[sqlid][3]),float(l_totlogical),float(rightsqliddict[sqlid][5])+float(rightsqliddict[sqlid][6])
                            ,float(stats[7]),float(rightsqliddict[sqlid][7])
                           ])
    else:
       t_difftk_lst.append([sqlid,stats[8].rstrip(),stats[0],0,float(stats[3]),0
                            ,float(l_totlogical),0,float(stats[7]),0
                           ])

difftk_df=DataFrame(t_difftk_lst,columns=['sqlid','sqltext','lplan','rplan','lela','rela','llreads','rlreads','lrows','rrows'])
difftk_df['eladiff']=difftk_df['lela']-difftk_df['rela']
difftk_df['rowsdiff']=difftk_df['lrows']-difftk_df['rrows']
difftk_df['lreadsdiff']=difftk_df['llreads']-difftk_df['rlreads']

pd.set_option('display.width',1000)
print (difftk_df.sort(columns='eladiff',ascending=False))

Published by rajeevramdas@aol.com

Leave a Reply Cancel reply