Installing postgres 15 client on Amazon Linux 2023

In order to install the postgres 15 client on Amazon Linux 2023, you can do the following.

First check the available packages

sudo dnf search postgres

You will see in the output, a list of postgres packages, that are available to install.. In order to install the client tools, install the postgresql15 package

sudo dnf install postgresql15

Once the package is installed you have access to psql, pg_dump etc.

Installing and using the postgres docker image

Using the postgres docker image, is the quickest way to get started with any of your postgres development environments, or if you need to test any features of the postgres database. Official postgres database images are available for download and use from docker hub. You can get images of postgres 11,12,13,14 and 15 from docker hub. In this blog post we will see how to get started with downloading and running this image on Ubuntu Linux.

The steps outlined below assume that you have one of the newer versions of ubuntu installed on a host and that you are using the linux user named “ubuntu” to run docker.

As root execute the following commands on the ubuntu host to download docker. These steps also setup the ubuntu user to run docker containers.

            mkdir -m 0755 -p /etc/apt/keyrings
            curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
            echo \
            "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
            $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
            apt-get update -y
            apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -y
            systemctl disable --now docker.service docker.socket
            su - ubuntu -c "dockerd-rootless-setuptool.sh install"

Now you are ready to download and run the postgres container.

Login as the “ubuntu” user and Create a text file named pg.yaml file, that has the following instructions. The following instructions assume that you have a directory /u01/pg/data which you want to use to store your postgres datafiles.

services:
  db:
    image: postgres:latest
    restart: always
    environment:
      POSTGRES_PASSWORD: MyPassword
    ports:
      - "5432:5432"
    volumes:
      - /u01/pg/data:/var/lib/postgresql/data

Now we can run docker to download and run this image using the following command

docker compose -f /home/ubuntu/tmp/pg.yaml up -d db

This command downloads and installs the latest postgres image (Which at the time of writing is 15.2) You can now connect to the postgres database from the ubuntu host as shown below.

psql -h localhost -p 5432 -d postgres -U postgres 

You can login as root to your postgres container using the command below

docker exec -it cc18a27f8850 bash

where cc18a27f8850 is the container id (You can get your container id by running the command, docker container ls -a.

Once you are logged in as root to the postgres container, you can su – postgres and then run commands to stop and start postgres. pg_ctl can be found in /usr/lib/postgresql/15/bin.

Adding documents and querying Elasticsearch

Elasticsearch is an opensource text search engine. It is accessible using RESTFul API’s and uses JSON documents to store data. It allows users to search very large amounts of data at a very high speed. Since it is written using the Java programming language it can run on many platforms. See this blog post on how to install, configure and start Elasticsearch 8.5.0.

In this blog post I will show you how you can create an index, insert data into the index and query the data in Elasticsearch. In the rest of this post, I assume that you are executing the curl commands from the host that is running the Elasticsearch server. Hence all the curl queries are directed to localhost. If you are running the curl commands from a client machine, replace localhost with your actual hostname (or ip address) for the server running Elasticsearch.

Inserting a single document into an Index

curl -k -POST https://localhost:9200/messages/_doc -H "Content-Type: application/json" -d'
{
  "msg_id" : 1,
  "msg_case_id" : 55,
  "msg_message" : "Hi This is xyz from def media company"
}' --user "elastic:Password1"

In the command shown above, “messages” is the Index that we are inserting the document into. We are also providing the Elasticsearch username and password to set the POST requirement to Elasticsearch. The -k is used to tell curl to accept the selft signed certificated from Elasticsearch.

Bulk inserting documents into an Index

First create a file named msg.json with the following lines

{"index":{"_id":"1"}}
{ "msg_id" : 1, "msg_case_id" : 55, "msg_message" : "Hi This is xyz from def media company" }
{"index":{"_id":"2"}} 
{ "msg_id" : 2, "msg_case_id" : 55, "msg_message" : "We provide targeted advertising to different platforms" }
{"index":{"_id":"3"}}
{ "msg_id" : 3, "msg_case_id" : 55, "msg_message" : "Includes TV, Radio, Online, Social Media etc" }
{"index":{"_id":"4"}}
{ "msg_id" : 4, "msg_case_id" : 55, "msg_message" : "Our conversion ratios are very high" }
{"index":{"_id":"5"}}
{ "msg_id" : 5, "msg_case_id" : 55, "msg_message" : "provides search engine optimization" }

You can batch insert all the documents in the file above to the Elasticsearch index named “messages” using the command below

curl -k -X POST https://localhost:9200/messages/_bulk?pretty -H "Content-Type: application/x-ndjson" --user "elastic:Password1" --data-binary @msg.json

Note that we are using the content-type x-ndjson here

You can delete all the documents in the “messages” index, using the command below

curl -k -XPOST 'https://localhost:9200/messages/_delete_by_query?conflicts=proceed&pretty' -H 'Content-Type: application/json' --user "elastic:Password1" -d'
{
    "query": {
        "match_all": {}
    }
}'

Querying documents

You can query all the documents from the “messages” index using the command below

curl -k -X GET 'https://localhost:9200/messages/_search' --user "elastic:Password1"

You can query specific documents that match specific criteria using the command below

curl -k -X POST "https://localhost:9200/messages/_search?pretty" -H 'Content-Type: application/json' --user "elastic:Password1" -d'
{
  "query": {
    "bool": {
      "filter": [
        {
        "term": {
          "msg_message": "ratio"
        }
        },
        {
        "term": {
          "msg_id": "4"
        }
        }
      ]
    }
  }
}
'

The above query will display the documents where the word ratio occurs in the msg_message property and when the msg_id property is 4.

In this blog post I have shown you how to insert documents into Elasticsearch indexes and query them.

Subscribing to PostgreSql logical replication using python and psycopg2

When postgresql is used as a transactional database, there are use cases where the data changes from the transactional database are captured and send to other databases like your datamart or datawarehouse. You could use cloud services like Aws Dms or replication software like Debazium to do this. In this blog post I will show you how to use python to read changes (cdc, change data capture) from a postgresql database using the wal2json output plugin and psycopg2.

When you are compiling postgresql from source code, you can enable the output plugins test_decoder or wal2json as shown below.

            cd /home/postgres/tmp/postgresql-14.4/contrib/test_decoding
            make PG_CONFIG=/u01/pg/14/bin/pg_config
            make PG_CONFIG=/u01/pg/14/bin/pg_config install
            cd /home/postgres/tmp/postgresql-14.4/
            wget https://github.com/eulerto/wal2json/archive/refs/tags/wal2json_2_4.tar.gz
            tar -xzvf wal2json_2_4.tar.gz
            cd wal2json-wal2json_2_4/ 
            make PG_CONFIG=/u01/pg/14/bin/pg_config
            make PG_CONFIG=/u01/pg/14/bin/pg_config install

where /home/postgres/tmp/postgresql-14.4 is the directory where you untarred your postgres source code into, before compiling and installing postgres to /u01/pg/14

In order to proceed you need to have installed the library psycopg2 with python3 (Eg: pip install psycopg2)

You also need to make sure that the parameter wal_level is set to ‘logical’ in the postgresql.conf file of your postgres database.

create a table named books in your postgresql database.

create table books (bookid bigint primary key,bookname varchar(100));

Insert a few rows into the table.

insert into books values (1,'First Book');
insert into books values (2,'Second Book');
insert into books values (3,'Third Book');

The python library psycopg2 has a module named extras which provides helpers to read from postgres logical replication publishers. We will be using the functions from this module, namely create_replication_slot , start_replication and consume_stream to create the publisher and subscriber for logical replication.

Here is the code sample for pglogical.py

from __future__ import print_function
import sys
import psycopg2
import psycopg2.extras

conn = psycopg2.connect(
    'host=localhost user=postgres port=5432',
    connection_factory=psycopg2.extras.LogicalReplicationConnection)
cur = conn.cursor()
replication_options = {
'include-xids':'1',
'include-timestamp':'1',
'pretty-print':'1'
}
try:
    cur.start_replication(
        slot_name='pytest', decode=True,
        options=replication_options)
except psycopg2.ProgrammingError:
    cur.create_replication_slot('pytest', output_plugin='wal2json')
    cur.start_replication(
        slot_name='pytest', decode=True,
        options=replication_options)


class DemoConsumer(object):
    def __call__(self, msg):
        print(msg.payload)
        msg.cursor.send_feedback(flush_lsn=msg.data_start)

democonsumer = DemoConsumer()

print("Starting streaming, press Control-C to end...", file=sys.stderr)
try:
   cur.consume_stream(democonsumer)
except KeyboardInterrupt:
   cur.close()
   conn.close()
   print("The slot 'pytest' still exists. Drop it with "
         "SELECT pg_drop_replication_slot('pytest'); if no longer needed.",
         file=sys.stderr)
   print("WARNING: Transaction logs will accumulate in pg_xlog "
         "until the slot is dropped.", file=sys.stderr)

The code above is a modified version of the code, published by Marco Nenciarini here.

The code above uses the wal2json output plugin, you can change it to use the test_decoding output plugin, but you will also have to change the replication_options variable to those supported by test_decoding.

You can run the following command to create the publication and the subscriber.

python3 pglogical.py

You will see a prompt saying “Starting streaming, press Control-C to end…”

Let us now make a few changes to the books table.

insert into books values (5,'Fifth Book');

do $$
<<first_block>>
begin
    insert into books values (6,'Sixth Book');
    delete from books where bookid = 3;
end first_block $$;

If you now go back to the screen where you ran your python program, you can see the following messages on screen.

{
        "xid": 741,
        "timestamp": "2022-07-11 21:29:37.301299+00",
        "change": [
                {
                        "kind": "insert",
                        "schema": "public",
                        "table": "books",
                        "columnnames": ["bookid", "bookname"],
                        "columntypes": ["bigint", "character varying(100)"],
                        "columnvalues": [5, "Fifth Book"]
                }
        ]
}
{
        "xid": 742,
        "timestamp": "2022-07-11 21:33:07.913942+00",
        "change": [
                {
                        "kind": "insert",
                        "schema": "public",
                        "table": "books",
                        "columnnames": ["bookid", "bookname"],
                        "columntypes": ["bigint", "character varying(100)"],
                        "columnvalues": [6, "Sixth Book"]
                }
                ,{
                        "kind": "delete",
                        "schema": "public",
                        "table": "books",
                        "oldkeys": {
                                "keynames": ["bookid"],
                                "keytypes": ["bigint"],
                                "keyvalues": [3]
                        }
                }
        ]
}

These are the changes the python subscriber program is reading from postgres the logical replication publisher.

You can then write these changes either to a csv file  or to another database as you choose.

Installing the pg_partman extension

pg_partman is an extension that simplifies the process of partition management in postgres.

Below are the steps that I followed to install the pg_partman extension with postgres 12.8.

Change your current working directory to the directory where you unzipped the postgres 12.8 source code.

cd <SomeStaticPath>/postgresql-12.8/contrib

git clone https://github.com/pgpartman/pg_partman.git

cd pg_partman

make PG_CONFIG=<pghome>/bin/pg_config NO_BGW=1

make install

Then Edit your postgresql.conf file and add pg_partman_bgw to the parameter shared_preload_libraries

Now restart your postgres instance

At this time you are ready to create the extension from postgres and use it.

Use psql to login to your database

CREATE SCHEMA partman;

CREATE EXTENSION pg_partman SCHEMA partman;

\dx (To list the extensions and the version installed in your database)

Loading IMDB data into postgresql

IMDB (Internet Movie Database) makes the movie dataset available for free download at https://datasets.imdbws.com/. The documentation for this dataset can be found at https://www.imdb.com/interfaces/.

In this blog post, I show you how to load this data into a PostgreSql database. The steps are executed from an Ubuntu Linux workstation. I assume that you already have a postgresql database with about 50Gb of free space to upload this data into and you know the connection information.

We’ll use the s32cinemagoer.py script that can be downloaded from github .

From the ubuntu workstation, install the following packages

sudo apt install python3-pip
sudo apt-get install libpq-dev

Now install the following python modules

pip3 install Psycopg2
pip3 install imdbpy

Create a new directory, to store the downloaded datafiles and download the datafiles from https://datasets.imdbws.com into this directory

mkdir dat
cd dat
wget https://datasets.imdbws.com/name.basics.tsv.gz
wget https://datasets.imdbws.com/title.akas.tsv.gz
wget https://datasets.imdbws.com/title.basics.tsv.gz
wget https://datasets.imdbws.com/title.crew.tsv.gz
wget https://datasets.imdbws.com/title.episode.tsv.gz
wget https://datasets.imdbws.com/title.principals.tsv.gz
wget https://datasets.imdbws.com/title.ratings.tsv.gz

Login to your postgresql database and create a new schema to hold the imdb tables (This step is optional. If you do not create this schema, then the tables and the corresponding data gets loaded into the public schema).

create schema imdb;

Before you run the script, lets edit the script and make one change, which will enable the script to load the data into the newly created imdb schema. This change will be made in line 183 in the file.

Change
engine = sqlalchemy.create_engine(db_uri, encoding='utf-8', echo=False)
To
engine = sqlalchemy.create_engine(db_uri, echo=False)

Change 
metadata.create_all(tables=[table])
To
metadata.create_all(bind=metadata.bind,tables=[table])

Then Execute the script as shown below

python3 s32cinemagoer.py /home/ubuntu/dat postgresql://username:password@dbhostname/dbname

Where /home/ubuntu/dat is the directory where the imdb files are downloaded into.

This will take some time to load (Close to an hour on a reasonably sized ubuntu workstation) and will consume about 15Gb of space in your postgresql database.

AWR – Profiling Database I/O

Oracle Awr (Automatic Workload Repository) statistics, captures and stores fine grained information about file reads and writes (aka i/o), that the database performed, during the course of execution of, application generated database workloads. When analyzing the read and write patterns of the database, it helps a lot to understand what type of activity is generating the reads and writes. With this stored information we can get an indepth understanding of the distribution of random and sequential reads and writes.

I use this information for getting a better understanding of the I/O profile, for my Exadata sizing exercises.

This information can be used to understand clearly how much of the i/o is from Temp activity, Datafile reads and writes, Archivelog writes, log writes, and whether these are small or large reads and writes.

To the best of my understanding the small reads and writes are those < 128k and the large reads and writes are those > 128k.

This information is contained mainly in two awr Views.

Dba_Hist_Iostat_FileType
Dba_Hist_Iostat_Function

Dba_Hist_Iostat_FileType

This view displays the historical i/o statistics by file type. The main filetypes are the following

Archive Log
Archive Log Backup
Control File
Data File
Data File Backup
Data File Copy
Data File Incremental Backup
Data Pump Dump File
Flashback Log
Log File
Other
Temp File

Dba_Hist_Iostat_Function

This view displays the historical i/o statistics by i/o function. The main i/o functions are the following

Recovery
Buffer Cache Reads
Others
RMAN
Streams AQ
Smart Scan
Data Pump
XDB
Direct Writes
DBWR
LGWR
Direct Reads
Archive Manager
ARCH

From everything i have seen sofar, these reads and writes can be directly co-related to the “Physical read total IO requests” and “Physical write total IO requests” system level statistics.

I have written a script that displays information from the above mentioned views and gives a detailed breakdown of i/o gendrated from different aspects of the database activities.
In order to fit in the computer screen real estate, i have actually limited the columns the script displays (So it displays only the file types i am frequently interested in). Please feel free to take the script and modify it to add columns that you want to display.

The full version of the script  awrioftallpct-pub.sql can be found here.

The script accepts the following inputs
– A begin snap id for a snapid range you want to report for
– A End snap id for a snapid range you want to report for
– A Dbid for the database
– The snap interval in seconds (If you have a 30 minute interval input 1800 seconds)

A description of all the column names in the output, broken down by section, is provided in the header section of the script.

There are 6 sections to this script

1) Total Reads + Writes
2) Total Reads
3) Total Writes
4) Read write breakdown for datafiles
5) Data File – Direct Path v/s Buffered Read Write breakdown
6) Read write breakdown for tempfiles

1) Total Reads + Writes

This section displays the number of reads+writes by filetype, and a percentage of reads+writes for each file type, as a percentage of total reads+writes. The last column displays the total reads+writes for all file types. The column DTDP shows the i/o that bypasses flash cache by default and goes directly to spinning disk on Exadata (Temp+Archivelogs+Flashback Logs).

io1-rw

Click on the image to see a larger version

2) Total Reads

This section displays the number of reads by filetype, and a percentage of reads for each file type, as a percentage of total reads. The last column displays the total reads for all file types.

io1-r

Click on the image to see a larger version

3) Total Writes

This section displays the number of writes by filetype, and a percentage of writes for each file type, as a percentage of total writes. The last column displays the total writes for all file types.

io1-w

Click on the image to see a larger version

4) Read write breakdown for datafiles

This section displays the I/O information only pertaining to datafile i/o. It displays the small and large reads and writes and a percentage they constitute of the total reads+writes to datafiles, and a percentage they constitute of the total reads or writes to datafiles. It also displays the total small and large reads and writes and a percentage they constitute of the total reads+writes to datafiles.

io1-dfrw

Click on the image to see a larger version

5) Data File – Direct Path v/s Buffered Read Write breakdown

This section provides a breakdown of I/O by function (As opposed to i/o by filetype in the previous sections). The output shows columns that display the direct path small and large reads and writes, buffered small reads and writes, smart scan small and large reads and other small and large reads and writes.

io3-bf

Click on the image to see a larger version

6) Read write breakdown for tempfiles

This section displays the I/O information only pertaining to tempfile i/o. It displays the small and large reads and writes and a percentage they constitute of the total reads+writes to tempfiles, and a percentage they constitute of the total reads or writes to tempfiles. It also displays the total small and large reads and writes and a percentage they constitute of the total reads+writes to tempfiles.

io3-tf

Click on the image to see a larger version

The full version of the script  awrioftallpct-pub.sql can be found here.

Oracle Active DataGuard – Considerations for the Wide area Network

Oracle customers use Oracle Active Dataguard to create and maintain one or many standby databases that protect their mission critical primary databases from disaster. Typically, in such deployments, the primary databases and standby databases are in geographically separate locations connected via a WAN (Wide Area Network). Log Transport Services, transfers Large volumes of redo logs from the primary location to the standby, using Sql*Net.

We have to ensure that all the components from the source to target are setup correctly to ensure that the data transfer can be done with the best throughput possible. If sufficient network bandwidth is not available with reasonable latencies, then we will start seeing the log transfer and apply,lagging on the standby site (Which is oracle speak for, your primary and standby database is now out of sync from a data perspective).

One key point to keep in mind is that, lower the network round trip time (aka latency), higher your data transfer throughput. Higher the network round trip time (aka latency), lower your data transfer throughput. So it is very important to maintain low round trip times on your Wide area network.

To understand network data transfer throughput, It is important to understand the the concepts of Tcp Window Size and Bandwidth Delay Product (aka BDP).

Tcp Window size is the amount of bytes that can be transmitted without receiving an acknowledgement from the other side. Once Tcp Window size amount of bytes are send, the sender stops sending any more bytes and waits for an acknowledgement from the receiver.

Bandwidth delay product is calculated as the product of the network bandwidth and network round trip time. bdp=network bandwidth*round trip time. This is the amount of data that left the sender before the first acknowledgement was received by the sender. If the senders output bandwidth is stable, and the bandwidth is fully used, then the BDP calculates the number of packets in transit. If we set the Tcp Window size equal to the bandwidth delay product, then in theory we should be able to fully utilize the available bandwidth.

Setup the network

We have to start by setting up the networking components to support the desired/stated bandwidth. So if you have a WAN that is a 10GigE network, all the NIC’s (Network interface cards), ports, switches in the configuration should be configured to support 10GigE full Duplex settings. After setup, run the configuration display utilities and ensure that all these component levels the transfer speeds are set to be 10 GigE. Customers often run into trouble when Auto Negotiation causes some NIC’s to set the transfer speeds to 1GigE because of configuration mismatches.

Use tools like Iperf to test the transfer speeds that your network is capable of achieving.

One important aspect to keep in mind is that it is probable that the WAN is shared by other traffic (e-mail, data replication, san replication). This has two important implications that we should consider.

  • If there is a lot of bandwidth consumption by some of this miscellaneous traffic, round trip times could be degraded periodically on the network.
  • We should be careful in our calculations that we do not completely consume the entire bandwidth for redo transport. (This could impact other processes)
    • So it is important to figure out (Working with the network admins) what the bandwidth entitlements are for redo transport and base our calculations on those numbers.

Caclulate our BDP

Use the following formula to calculate our Bandwidth delay product (BDP)

(bandwidth/8)*rount trip time in seconds.

The network bandwidth is expressed in bits per second, so we divide by 8 to convert to bytes.
Round Trip Time is usually in milli seconds, so we divide by 1000 to convert to seconds.

So for example, if we have a 10Gbit network bandwidth and a 40ms round trip time

BDP=(10000000000/8)*(40/1000) = 50,000,000 bytes.

Setup Sql*Net Parameters

The current recommendations for Dataguard Redo transport are as follows.

Set the SDU size to 65535

  • We can set SDU on a per connection basis using the SDU parameter in the local naming configuration file (TNSNAMES.ORA) and the listener configuration file (LISTENER.ORA)
  • We can set the SDU for all Oracle Net connections with the profile parameter DEFAULT_SDU_SIZE in the SQLNET.ORA file.

Set TCP.NODELAY to YES

To preempt delays in buffer flushing in the TCP protocol stack, disable the TCP Nagle algorithm by setting TCP.NODELAY to YES in the SQLNET.ORA file on both the primary and standby systems.

Setup RECV_BUF_SIZE and SEND_BUF_SIZE

The current recommendation is to set the SEND_BUF_SIZE and RECV_BUF_SIZE parameters (Which are the send and receive socket buffer sizes for SQL*Net) to 3 Times the BDP.

As per the above example we would set them to 50,000,000*3 = 150,000,000

Setup Operating system Kernel Parameters

If you are using the Linux operating system make sure that the values for the following kernel parameters are setup to be higher than the values set for RECV_BUF_SIZE and SEND_BUF_SIZE.

net.core.rmem_max
net.core.wmem_max

Once we have configured the network, operating system and the sql*net, and we have redo transfer, we can perform further network monitoring to see how the network bandwidth is being utilized, and make appropriate adjustments.

Links to helpful Documents

Iperf

How to calcluate Tcp throughput for long distance links (blog)

Oracle Net Performance Tuning (Mos)

Setting Send and Receive Buffer Sizes (Mos)

Tuning Sql*Net peformance (Oracle Docs)

Configuring Oracle Dataguard (Oracle Docs)

Best Practices for Sync Data Transport (White Paper)

Script to compare tkprof output files

I often use the oracle 10046 event tracing mechanism to capture sql’s from a session to identify why certain transactions are running slower in different env’s or at different points in time. Oracle does have a mechanism where you can save the trace information in database tables. One can use the INSERT parameter in tkprof to store the trace information into a database table. Once in the table you can write sql’s that compare multiple runs or multiple statements.

I wrote a python program that compares two different tkprof output files. The files are compared, and the following aspects of each of the sqlid’s in the tkprof output file,s are printed side by side. The output is sorted by the Difference in Elapsed Time, in Descending order.

  • Sql text
  • Plan Hash Value
  • Total Elapsed time
  • Total Logical Reads
  • Total Rows processed
  • Difference in Elapsed Time
  • Difference in Number of Rows processed
  • Difference in Logical reads.

Other columns can be added to this, if you desire.
I use this script output as a quick way to see which sql’s are running slower and are probably candidates for further analysis/tuning.

The sqlid’s from the file provided as the first argument to the script (referred to as the left file) are compared to the same sqlid’s in the file provided as the second argument to the script (referred to as the right file). The following columns are displayed.

sqlid                           sqlid being compared
text                             First 20 chars of the sql text
lplan                           Plan hash value from the left file
rplan                          Plan hash value from the right file
lela                             Total Elapsed time from the left file
rela                            Total Elapsed time from the right file
llreads                       Total Logical reads (query+current) from the left file
rlreads                      Total Logical reads (query+current) from the right file
lrows                         Total rows processed from the left file
rrows                        Total rows processed from the right file
eladiff                        Lela – Rela
rowsdiff                    Lrows – Rrows
lreadsdiff                  Llreads – rlreads

Here is a sample syntax for running the script. (You need the python pandas package to be installed for this to execute successfully)

python ./difftk.py /u01/tkprofout/Newplans.prf /u01/tkprofout/Stage.prf

Here is a sample output

difftk

Click on the image to view a larger version.

The full script is below

#Python script to list differences between sql executions in two tkprof output files
#useful if comparing tkprof from prod and dev for example
#Author : rajeev.ramdas

import sys
import os
import pandas as pd
from pandas import DataFrame

# Define a class to hold the counters for each sqlid
class sqliddet:
     def init(self):
            sqlid=''
            text=''
            plan_hash=''
            tcount=0
            tcpu=0
            tela=0
            tdisk=0
            tquery=0
            tcurr=0
            trows=0

# Define 2 empty dictionaries to store info about each input file
leftsqliddict={}
rightsqliddict={}

# Process each file and add one row per sqlid to the dictionary
# We want to add the row to the dictionary only after the SQLID row and the total row has been read
# So the firstsqlid flag is used to make sure that we do not insert before total is read for the first row.

def processfile(p_file,p_sqliddict):

    myfile=open(p_file,"r")
    line=myfile.readline()
    firstsqlid=True
    linespastsqlid=99
    while line:
        linespastsqlid+=1
        line=myfile.readline()
        if line.startswith('SQL ID'):
            linespastsqlid=0
            if firstsqlid==True:
                firstsqlid=False
            else:
                p_sqliddict[currsqlid.sqlid]=[currsqlid.plan_hash,currsqlid.tcount,currsqlid.tcpu,currsqlid.tela,currsqlid.tdisk,currsqlid.tquery
                            ,currsqlid.tcurr,currsqlid.trows,currsqlid.text]
            currsqlid=sqliddet()
            currsqlid.sqlid=line.split()[2]
            currsqlid.plan_hash=line.split()[5]
        if linespastsqlid==2:
            currsqlid.text=line[0:20]
        if line.startswith('total'):
            a,currsqlid.tcount,currsqlid.tcpu,currsqlid.tela,currsqlid.tdisk,currsqlid.tquery,currsqlid.tcurr,currsqlid.trows=line.split()
        if line.startswith('OVERALL'):
            p_sqliddict[currsqlid.sqlid]=[currsqlid.plan_hash,currsqlid.tcount,currsqlid.tcpu,currsqlid.tela,currsqlid.tdisk,currsqlid.tquery
                       ,currsqlid.tcurr,currsqlid.trows,currsqlid.text]
        continue
    myfile.close()

# Main portion of script
if len(sys.argv) != 3:
   print('Syntax : python ./difftk.py tkprof1.out tkprof2.out')
   sys.exit()

if not os.path.isfile(sys.argv[1]) or not os.path.isfile(sys.argv[2]):
   print ("File Does not Exist")
   sys.exit()

processfile(sys.argv[1],leftsqliddict)
processfile(sys.argv[2],rightsqliddict)

t_difftk_lst=[]

# Match the sqlid's from the file on the left to the file on the right
# Gather up the statistics form both sides, insert into a list
# Transfer the list to a pandas dataframe, add some computed columns

for sqlid,stats in leftsqliddict.items():
    l_totlogical=int(stats[5])+int(stats[6])
    if sqlid in rightsqliddict:
       t_difftk_lst.append([sqlid,stats[8].rstrip(),stats[0],rightsqliddict[sqlid][0],float(stats[3])
                            ,float(rightsqliddict[sqlid][3]),float(l_totlogical),float(rightsqliddict[sqlid][5])+float(rightsqliddict[sqlid][6])
                            ,float(stats[7]),float(rightsqliddict[sqlid][7])
                           ])
    else:
       t_difftk_lst.append([sqlid,stats[8].rstrip(),stats[0],0,float(stats[3]),0
                            ,float(l_totlogical),0,float(stats[7]),0
                           ])

difftk_df=DataFrame(t_difftk_lst,columns=['sqlid','sqltext','lplan','rplan','lela','rela','llreads','rlreads','lrows','rrows'])
difftk_df['eladiff']=difftk_df['lela']-difftk_df['rela']
difftk_df['rowsdiff']=difftk_df['lrows']-difftk_df['rrows']
difftk_df['lreadsdiff']=difftk_df['llreads']-difftk_df['rlreads']

pd.set_option('display.width',1000)
print (difftk_df.sort(columns='eladiff',ascending=False))

Oracle Database In-Memory an introduction Part 2 – What do i need to do, to use the Oracle In-Memory Database ?

Step 1) Define the INMEMORY_SIZE

Customer has to setup the correct value for a database initialization parameter , INMEMORY_SIZE. This parameter specifies the amount of memory, from the SGA, that is to be used for the In-Memory column store. This is a static pool (ie Automatic memory management cannot extend or shrink this area), which means that you have to restart the database if any changes to this parameter needs to take effect. The In-Memory area is sub-divided into two pools: a 1MB pool used to store the actual column formatted data populated into memory, and a 64K pool used to store metadata about the objects that are populated into the IM column store.

 

sho parameter inmemory_size

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
inmemory_size			     big integer 500M

select pool,alloc_bytes,used_bytes,populate_status from v$inmemory_area;

POOL			   ALLOC_BYTES USED_BYTES POPULATE_STATUS
-------------------------- ----------- ---------- --------------------------
1MB POOL		     418381824		0 DONE
64KB POOL		     100663296		0 DONE

Step 2) Mark the performance critical objects in your database, with the attribute INMEMORY

select partition_name,bytes/(1024),inmemory,inmemory_compression from dba_segments where
owner = 'SH' and segment_name = 'SALES'

PARTITION_NAME		       BYTES/(1024) INMEMORY INMEMORY_COMPRESS
------------------------------ ------------ -------- -----------------
SALES_Q1_1998			       8192 DISABLED

ALTER TABLE SH.SALES MODIFY PARTITION SALES_Q1_1998 INMEMORY PRIORITY CRITICAL MEMCOMPRESS FOR QUERY HIGH;

Table altered.

select partition_name,bytes/(1024),inmemory,inmemory_compression from dba_segments where
owner = 'SH' and segment_name = 'SALES'  2  ;

PARTITION_NAME		       BYTES/(1024) INMEMORY INMEMORY_COMPRESS
------------------------------ ------------ -------- -----------------
SALES_Q1_1998			       8192 ENABLED  FOR QUERY HIGH


Step 3) Populate the In-Memory datastore

Objects are populated into the In-Memory Datastore, in a prioritized list, immediately after the database is opened, or after they are scanned for the first time. There are 7 levels for the keyword PRIORITY (CRITICAL, HIGH, MEDIUM, LOW, NONE).

The IM column store is populated by a set of background processes referred to as worker processes (ora_w001_orcl). The database is fully active / accessible while this occurs.Each worker process is given a subset of database blocks from the object to populate into the IM column store. Population is a streaming mechanism, simultaneously columnizing and compressing the data. There is a new IMCO (In memory co-ordinator) background process which wakes up every 2 minutes and checks to see if there are any population tasks that need to be completed. Eg: A new object has been marked as InMemory with a PRIORITY other than None.

select v.owner,v.segment_name,v.partition_name,v.bytes orig_size,v.inmemory_size in_mem_size

OWNER	   SEGMENT_NA PARTITION_NAME		      ORIG_SIZE IN_MEM_SIZE
---------- ---------- ------------------------------ ---------- -----------
SH	   SALES      SALES_Q1_1998			8388608     1179648

select * from 
(
select /*+ full(sales) */ channel_id,count(*)
from sh.sales partition (sales_q1_1998)
group by channel_id
order by count(*) desc
)
where rownum < 6

CHANNEL_ID   COUNT(*)
---------- ----------
	 3	32796
	 2	 6602
	 4	 3926
	 9	  363

Elapsed: 00:00:00.09

select * from table (dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------
SQL_ID	40pjk921r3jrc, child number 0
-------------------------------------
select * from ( select /*+ full(sales) */ channel_id,count(*) from
sh.sales partition (sales_q1_1998) group by channel_id order by
count(*) desc ) where rownum < 6

Plan hash value: 2962696457

---------------------------------------------------------------------------------------------------------
| Id  | Operation			| Name	| Rows	| Bytes | Cost (%CPU)| Time	| Pstart| Pstop |
---------------------------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT		|	|	|	|    12 (100)|		|	|	|
|*  1 |  COUNT STOPKEY			|	|	|	|	     |		|	|	|
|   2 |   VIEW				|	|     4 |   104 |    12  (34)| 00:00:01 |	|	|
|*  3 |    SORT ORDER BY STOPKEY	|	|     4 |    12 |    12  (34)| 00:00:01 |	|	|
|   4 |     HASH GROUP BY		|	|     4 |    12 |    12  (34)| 00:00:01 |	|	|
|   5 |      PARTITION RANGE SINGLE	|	| 43687 |   127K|     9  (12)| 00:00:01 |     5 |     5 |
|   6 |       TABLE ACCESS INMEMORY FULL| SALES | 43687 |   127K|     9  (12)| 00:00:01 |     5 |     5 |
---------------------------------------------------------------------------------------------------------

For much more in-depth technical details of the Oracle Database In-Memory, please see this whitepaper.