Malayalam typing on a Amazon fire tablet

August 2, 2024August 2, 2024 ~ rajeevramdas@aol.com ~ Leave a comment

I did the following steps to enable typing in English and transliterating the typed words to Malayalam on a Fire tablet. I have tried these steps on a Fire 10 2023 tablet, which runs a modified version of the Android operating system. I also used an external Bluetooth connected Arteck keyboard with the fire tablet which makes typing long sentences / paragraphs easier on a tablet.

First you would want to enable the Google play store on your fire tablet (By default only the Amazon app store is enabled, which is limited as to the apps you can download and install). The instructions can be found at https://www.androidpolice.com/install-play-store-amazon-fire-tablet/

From the Google play app store, search for the app named “Gboard” and install it.

In the settings for Gboard enable the Malayalam keyboard.

Now you are ready to type in Malayalam. When you are in an application like Whatsapp, and you want to type in Malayalam, you can click on the keyboard button and choose the keyboard you want to use . Make sure you choose the Gboard keyboard.

On the keyboard you will see a globe image, click on that and you will see the space bar indicate MLEN.

Then whatever you type in English will be converted to Malayalam. ഇത് പോലെ….

Installing postgres 15 client on Amazon Linux 2023

February 15, 2024February 15, 2024 ~ rajeevramdas@aol.com ~ Leave a comment

In order to install the postgres 15 client on Amazon Linux 2023, you can do the following.

First check the available packages

sudo dnf search postgres

You will see in the output, a list of postgres packages, that are available to install.. In order to install the client tools, install the postgresql15 package

sudo dnf install postgresql15

Once the package is installed you have access to psql, pg_dump etc.

Installing and using the postgres docker image

April 18, 2023April 18, 2023 ~ rajeevramdas@aol.com ~ Leave a comment

Using the postgres docker image, is the quickest way to get started with any of your postgres development environments, or if you need to test any features of the postgres database. Official postgres database images are available for download and use from docker hub. You can get images of postgres 11,12,13,14 and 15 from docker hub. In this blog post we will see how to get started with downloading and running this image on Ubuntu Linux.

The steps outlined below assume that you have one of the newer versions of ubuntu installed on a host and that you are using the linux user named “ubuntu” to run docker.

As root execute the following commands on the ubuntu host to download docker. These steps also setup the ubuntu user to run docker containers.

            mkdir -m 0755 -p /etc/apt/keyrings
            curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
            echo \
            "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
            $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
            apt-get update -y
            apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -y
            systemctl disable --now docker.service docker.socket
            su - ubuntu -c "dockerd-rootless-setuptool.sh install"

Now you are ready to download and run the postgres container.

Login as the “ubuntu” user and Create a text file named pg.yaml file, that has the following instructions. The following instructions assume that you have a directory /u01/pg/data which you want to use to store your postgres datafiles.

services:
  db:
    image: postgres:latest
    restart: always
    environment:
      POSTGRES_PASSWORD: MyPassword
    ports:
      - "5432:5432"
    volumes:
      - /u01/pg/data:/var/lib/postgresql/data

Now we can run docker to download and run this image using the following command

docker compose -f /home/ubuntu/tmp/pg.yaml up -d db

This command downloads and installs the latest postgres image (Which at the time of writing is 15.2) You can now connect to the postgres database from the ubuntu host as shown below.

psql -h localhost -p 5432 -d postgres -U postgres

You can login as root to your postgres container using the command below

docker exec -it cc18a27f8850 bash

where cc18a27f8850 is the container id (You can get your container id by running the command, docker container ls -a.

Once you are logged in as root to the postgres container, you can su – postgres and then run commands to stop and start postgres. pg_ctl can be found in /usr/lib/postgresql/15/bin.

Adding documents and querying Elasticsearch

November 10, 2022November 10, 2022 ~ rajeevramdas@aol.com ~ Leave a comment

Elasticsearch is an opensource text search engine. It is accessible using RESTFul API’s and uses JSON documents to store data. It allows users to search very large amounts of data at a very high speed. Since it is written using the Java programming language it can run on many platforms. See this blog post on how to install, configure and start Elasticsearch 8.5.0.

In this blog post I will show you how you can create an index, insert data into the index and query the data in Elasticsearch. In the rest of this post, I assume that you are executing the curl commands from the host that is running the Elasticsearch server. Hence all the curl queries are directed to localhost. If you are running the curl commands from a client machine, replace localhost with your actual hostname (or ip address) for the server running Elasticsearch.

Inserting a single document into an Index

curl -k -POST https://localhost:9200/messages/_doc -H "Content-Type: application/json" -d'
{
  "msg_id" : 1,
  "msg_case_id" : 55,
  "msg_message" : "Hi This is xyz from def media company"
}' --user "elastic:Password1"

In the command shown above, “messages” is the Index that we are inserting the document into. We are also providing the Elasticsearch username and password to set the POST requirement to Elasticsearch. The -k is used to tell curl to accept the selft signed certificated from Elasticsearch.

Bulk inserting documents into an Index

First create a file named msg.json with the following lines

{"index":{"_id":"1"}}
{ "msg_id" : 1, "msg_case_id" : 55, "msg_message" : "Hi This is xyz from def media company" }
{"index":{"_id":"2"}} 
{ "msg_id" : 2, "msg_case_id" : 55, "msg_message" : "We provide targeted advertising to different platforms" }
{"index":{"_id":"3"}}
{ "msg_id" : 3, "msg_case_id" : 55, "msg_message" : "Includes TV, Radio, Online, Social Media etc" }
{"index":{"_id":"4"}}
{ "msg_id" : 4, "msg_case_id" : 55, "msg_message" : "Our conversion ratios are very high" }
{"index":{"_id":"5"}}
{ "msg_id" : 5, "msg_case_id" : 55, "msg_message" : "provides search engine optimization" }

You can batch insert all the documents in the file above to the Elasticsearch index named “messages” using the command below

curl -k -X POST https://localhost:9200/messages/_bulk?pretty -H "Content-Type: application/x-ndjson" --user "elastic:Password1" --data-binary @msg.json

Note that we are using the content-type x-ndjson here

You can delete all the documents in the “messages” index, using the command below

curl -k -XPOST 'https://localhost:9200/messages/_delete_by_query?conflicts=proceed&pretty' -H 'Content-Type: application/json' --user "elastic:Password1" -d'
{
    "query": {
        "match_all": {}
    }
}'

Querying documents

You can query all the documents from the “messages” index using the command below

curl -k -X GET 'https://localhost:9200/messages/_search' --user "elastic:Password1"

You can query specific documents that match specific criteria using the command below

curl -k -X POST "https://localhost:9200/messages/_search?pretty" -H 'Content-Type: application/json' --user "elastic:Password1" -d'
{
  "query": {
    "bool": {
      "filter": [
        {
        "term": {
          "msg_message": "ratio"
        }
        },
        {
        "term": {
          "msg_id": "4"
        }
        }
      ]
    }
  }
}
'

The above query will display the documents where the word ratio occurs in the msg_message property and when the msg_id property is 4.

In this blog post I have shown you how to insert documents into Elasticsearch indexes and query them.

Subscribing to PostgreSql logical replication using python and psycopg2

July 12, 2022July 12, 2022 ~ rajeevramdas@aol.com ~ Leave a comment

When postgresql is used as a transactional database, there are use cases where the data changes from the transactional database are captured and send to other databases like your datamart or datawarehouse. You could use cloud services like Aws Dms or replication software like Debazium to do this. In this blog post I will show you how to use python to read changes (cdc, change data capture) from a postgresql database using the wal2json output plugin and psycopg2.

When you are compiling postgresql from source code, you can enable the output plugins test_decoder or wal2json as shown below.

            cd /home/postgres/tmp/postgresql-14.4/contrib/test_decoding
            make PG_CONFIG=/u01/pg/14/bin/pg_config
            make PG_CONFIG=/u01/pg/14/bin/pg_config install
            cd /home/postgres/tmp/postgresql-14.4/
            wget https://github.com/eulerto/wal2json/archive/refs/tags/wal2json_2_4.tar.gz
            tar -xzvf wal2json_2_4.tar.gz
            cd wal2json-wal2json_2_4/ 
            make PG_CONFIG=/u01/pg/14/bin/pg_config
            make PG_CONFIG=/u01/pg/14/bin/pg_config install

where /home/postgres/tmp/postgresql-14.4 is the directory where you untarred your postgres source code into, before compiling and installing postgres to /u01/pg/14

In order to proceed you need to have installed the library psycopg2 with python3 (Eg: pip install psycopg2)

You also need to make sure that the parameter wal_level is set to ‘logical’ in the postgresql.conf file of your postgres database.

create a table named books in your postgresql database.

create table books (bookid bigint primary key,bookname varchar(100));

Insert a few rows into the table.

insert into books values (1,'First Book');
insert into books values (2,'Second Book');
insert into books values (3,'Third Book');

The python library psycopg2 has a module named extras which provides helpers to read from postgres logical replication publishers. We will be using the functions from this module, namely create_replication_slot , start_replication and consume_stream to create the publisher and subscriber for logical replication.

Here is the code sample for pglogical.py

from __future__ import print_function
import sys
import psycopg2
import psycopg2.extras

conn = psycopg2.connect(
    'host=localhost user=postgres port=5432',
    connection_factory=psycopg2.extras.LogicalReplicationConnection)
cur = conn.cursor()
replication_options = {
'include-xids':'1',
'include-timestamp':'1',
'pretty-print':'1'
}
try:
    cur.start_replication(
        slot_name='pytest', decode=True,
        options=replication_options)
except psycopg2.ProgrammingError:
    cur.create_replication_slot('pytest', output_plugin='wal2json')
    cur.start_replication(
        slot_name='pytest', decode=True,
        options=replication_options)


class DemoConsumer(object):
    def __call__(self, msg):
        print(msg.payload)
        msg.cursor.send_feedback(flush_lsn=msg.data_start)

democonsumer = DemoConsumer()

print("Starting streaming, press Control-C to end...", file=sys.stderr)
try:
   cur.consume_stream(democonsumer)
except KeyboardInterrupt:
   cur.close()
   conn.close()
   print("The slot 'pytest' still exists. Drop it with "
         "SELECT pg_drop_replication_slot('pytest'); if no longer needed.",
         file=sys.stderr)
   print("WARNING: Transaction logs will accumulate in pg_xlog "
         "until the slot is dropped.", file=sys.stderr)

The code above is a modified version of the code, published by Marco Nenciarini here.

The code above uses the wal2json output plugin, you can change it to use the test_decoding output plugin, but you will also have to change the replication_options variable to those supported by test_decoding.

You can run the following command to create the publication and the subscriber.

python3 pglogical.py

You will see a prompt saying “Starting streaming, press Control-C to end…”

Let us now make a few changes to the books table.

insert into books values (5,'Fifth Book');

do $$
<<first_block>>
begin
    insert into books values (6,'Sixth Book');
    delete from books where bookid = 3;
end first_block $$;

If you now go back to the screen where you ran your python program, you can see the following messages on screen.

{
        "xid": 741,
        "timestamp": "2022-07-11 21:29:37.301299+00",
        "change": [
                {
                        "kind": "insert",
                        "schema": "public",
                        "table": "books",
                        "columnnames": ["bookid", "bookname"],
                        "columntypes": ["bigint", "character varying(100)"],
                        "columnvalues": [5, "Fifth Book"]
                }
        ]
}
{
        "xid": 742,
        "timestamp": "2022-07-11 21:33:07.913942+00",
        "change": [
                {
                        "kind": "insert",
                        "schema": "public",
                        "table": "books",
                        "columnnames": ["bookid", "bookname"],
                        "columntypes": ["bigint", "character varying(100)"],
                        "columnvalues": [6, "Sixth Book"]
                }
                ,{
                        "kind": "delete",
                        "schema": "public",
                        "table": "books",
                        "oldkeys": {
                                "keynames": ["bookid"],
                                "keytypes": ["bigint"],
                                "keyvalues": [3]
                        }
                }
        ]
}

These are the changes the python subscriber program is reading from postgres the logical replication publisher.

You can then write these changes either to a csv file or to another database as you choose.

Installing the pg_partman extension

November 29, 2021November 29, 2021 ~ rajeevramdas@aol.com ~ Leave a comment

pg_partman is an extension that simplifies the process of partition management in postgres.

Below are the steps that I followed to install the pg_partman extension with postgres 12.8.

Change your current working directory to the directory where you unzipped the postgres 12.8 source code.

cd <SomeStaticPath>/postgresql-12.8/contrib

git clone https://github.com/pgpartman/pg_partman.git

cd pg_partman

make PG_CONFIG=<pghome>/bin/pg_config NO_BGW=1

make install

Then Edit your postgresql.conf file and add pg_partman_bgw to the parameter shared_preload_libraries

Now restart your postgres instance

At this time you are ready to create the extension from postgres and use it.

Use psql to login to your database

CREATE SCHEMA partman;

CREATE EXTENSION pg_partman SCHEMA partman;

\dx (To list the extensions and the version installed in your database)

Loading IMDB data into postgresql

May 3, 2021March 18, 2023 ~ rajeevramdas@aol.com ~ Leave a comment

IMDB (Internet Movie Database) makes the movie dataset available for free download at https://datasets.imdbws.com/. The documentation for this dataset can be found at https://www.imdb.com/interfaces/.

In this blog post, I show you how to load this data into a PostgreSql database. The steps are executed from an Ubuntu Linux workstation. I assume that you already have a postgresql database with about 50Gb of free space to upload this data into and you know the connection information.

We’ll use the s32cinemagoer.py script that can be downloaded from github .

From the ubuntu workstation, install the following packages


sudo apt install python3-pip 

sudo apt-get install libpq-dev

Now install the following python modules


pip3 install Psycopg2 

pip3 install imdbpy

Create a new directory, to store the downloaded datafiles and download the datafiles from https://datasets.imdbws.com into this directory


mkdir dat 

cd dat 

wget https://datasets.imdbws.com/name.basics.tsv.gz 

wget https://datasets.imdbws.com/title.akas.tsv.gz 

wget https://datasets.imdbws.com/title.basics.tsv.gz 

wget https://datasets.imdbws.com/title.crew.tsv.gz 

wget https://datasets.imdbws.com/title.episode.tsv.gz 

wget https://datasets.imdbws.com/title.principals.tsv.gz 

wget https://datasets.imdbws.com/title.ratings.tsv.gz

Login to your postgresql database and create a new schema to hold the imdb tables (This step is optional. If you do not create this schema, then the tables and the corresponding data gets loaded into the public schema).


    create schema imdb;

Before you run the script, lets edit the script and make one change, which will enable the script to load the data into the newly created imdb schema. This change will be made in line 183 in the file.


 Change 


engine = sqlalchemy.create_engine(db_uri, encoding='utf-8', echo=False) 


 To  


engine = sqlalchemy.create_engine(db_uri, echo=False)

Change 
metadata.create_all(tables=[table])
To
metadata.create_all(bind=metadata.bind,tables=[table])

Then Execute the script as shown below



python3 s32cinemagoer.py /home/ubuntu/dat postgresql://username:password@dbhostname/dbname

Where /home/ubuntu/dat is the directory where the imdb files are downloaded into.

This will take some time to load (Close to an hour on a reasonably sized ubuntu workstation) and will consume about 15Gb of space in your postgresql database.

AWR – Profiling Database I/O

June 2, 2015December 29, 2019 ~ rajeevramdas@aol.com ~ Leave a comment

Oracle Awr (Automatic Workload Repository) statistics, captures and stores fine grained information about file reads and writes (aka i/o), that the database performed, during the course of execution of, application generated database workloads. When analyzing the read and write patterns of the database, it helps a lot to understand what type of activity is generating the reads and writes. With this stored information we can get an indepth understanding of the distribution of random and sequential reads and writes.

I use this information for getting a better understanding of the I/O profile, for my Exadata sizing exercises.

This information can be used to understand clearly how much of the i/o is from Temp activity, Datafile reads and writes, Archivelog writes, log writes, and whether these are small or large reads and writes.

To the best of my understanding the small reads and writes are those < 128k and the large reads and writes are those > 128k.

This information is contained mainly in two awr Views.

Dba_Hist_Iostat_FileType
Dba_Hist_Iostat_Function

Dba_Hist_Iostat_FileType

This view displays the historical i/o statistics by file type. The main filetypes are the following

Archive Log
Archive Log Backup
Control File
Data File
Data File Backup
Data File Copy
Data File Incremental Backup
Data Pump Dump File
Flashback Log
Log File
Other
Temp File

Dba_Hist_Iostat_Function

This view displays the historical i/o statistics by i/o function. The main i/o functions are the following

Recovery
Buffer Cache Reads
Others
RMAN
Streams AQ
Smart Scan
Data Pump
XDB
Direct Writes
DBWR
LGWR
Direct Reads
Archive Manager
ARCH

From everything i have seen sofar, these reads and writes can be directly co-related to the “Physical read total IO requests” and “Physical write total IO requests” system level statistics.

I have written a script that displays information from the above mentioned views and gives a detailed breakdown of i/o gendrated from different aspects of the database activities.
In order to fit in the computer screen real estate, i have actually limited the columns the script displays (So it displays only the file types i am frequently interested in). Please feel free to take the script and modify it to add columns that you want to display.

The full version of the script awrioftallpct-pub.sql can be found here.

The script accepts the following inputs
– A begin snap id for a snapid range you want to report for
– A End snap id for a snapid range you want to report for
– A Dbid for the database
– The snap interval in seconds (If you have a 30 minute interval input 1800 seconds)

A description of all the column names in the output, broken down by section, is provided in the header section of the script.

There are 6 sections to this script

1) Total Reads + Writes
2) Total Reads
3) Total Writes
4) Read write breakdown for datafiles
5) Data File – Direct Path v/s Buffered Read Write breakdown
6) Read write breakdown for tempfiles

1) Total Reads + Writes

This section displays the number of reads+writes by filetype, and a percentage of reads+writes for each file type, as a percentage of total reads+writes. The last column displays the total reads+writes for all file types. The column DTDP shows the i/o that bypasses flash cache by default and goes directly to spinning disk on Exadata (Temp+Archivelogs+Flashback Logs).