Popular Posts

Monday, January 28, 2013

Postgres XC - explained

Postgres XC explained 

Users demand faster processing and as the processing and reports get faster so do the expectation for even faster performance and scalability. Now days its unthinkable not to deliver the performance  and availability.

All of those challenges require a constant search/development of new and improves solutions. Enter Postgres XC just when you were thinking that RDBMS is dead it comes back with the vengeance. So what is exactly Postgres XC and why should anyone care? 

Postgres XC shares a common name with the Postgres RDBMS but this is where the similarities stop, Postgres is a standalone database, Postgres XC on the other hand is a distributed RDBMS in a "share nothing" configuration. With this publication I'll make an attempt to explain Postgres XC in plain English, the technology is very promising and the complexity combined with the lack of proper documentation usually are the very strong barriers to any technology adoption. Just look at MySQL NDB cluster - great technology that was kept from the mainstream adoption.

With all the promises of NoSQL databases - never mind how you twist the name: Not only SQL or No to SQL, the reality remains - RDBMS is far more superior and simpler to extract and analyse your data.   SQL language was brilliant invention - very easy to use and learn, the only barriers that kept RDBMS from proliferating are the performance and scalability, Postgres XC seems to be closing this gap while delivering the familiarity of SQL language.

Distributed Database and "Share Nothing" cluster 

Postgres XC is a "distributed" database, so what does it mean. Well in a simple terms that means that various databases components are scattered about several physical severs. What are the typical RDBMS database components:

1. SQL Processing - or SQL interactive layer 
2. Data Storage - i.e. the data files 
3. Some sort of transactions coordinator 

In typical single server DBMS all of the above components are residing together on the same server, "distributed databases" on the other hand scatter its components on the several nodes, for example the SQL processing engine could be sitting on a separate server and in turn communicating with "data storage" that are sitting on the other server.  Now why bother scattering things up if you could neatly place then on the same box? Scalability - as workload increases and more users are pounding your site with poorly written SQL eventually you ran out of even the biggest box capacity and the need to "distribute" the processing will arise.

There is another big bonus to "scattering"  the components - you effectively eliminate the SPOF  aka single point of failure, when some of the components fail the others are continuing to function thus eliminating the possibility of your site being down.

Here is the example of typical RDBMS - all components reside together on the same box:

In contrast "distributed" database has its components on various servers - SQL processing and Data nodes for example:

This is somewhat simplistic representation of the "distributed" databases but it does serve the purpose to communicate the logical design.

Distributed databases are nothing new and some of them been around for a number of years for example, Oracle RAC, MySQL NDB.

Share Nothing and Distributed database

Postgres XC is a share-nothing and distributed database, we already covered the subject of the "distributed database" lets continue and decipher  the "share nothing". In the fust part we discussed the database components the SQL Engine, Data Storage etc... "Share Nothing" database design means that none of the databases components are shared among the servers. Postgres XC not only can scatter the database components across the nodes it can also have multiple node housing multiple components:

Here we have a "distributed" and "share nothing" database above - the database components are distributed among the servers and the components are in turn not shared. Here you have it - distributed and share nothing database. You probably noticed in distributed share nothing database all the data is accessible from any server since the data is usually synchronously replicated among the nodes.  

One of the great example of distributed and share nothing database is MySQL NDB cluster it looks very similar to the image above and is synchronously replication the data between the nodes. 

Postgres XC Share Nothing Distributed Database 

Postgres XC - developed and maintained by Postgres XC Development Group            is a distributed share nothing database that consists of the following components:

1. Coordinator 
2. Data storage
3. DTM - Distributed  Transaction coordinator


Coordinator is essentially an SQL interface to your data i.e. this is the components you use to access the data in Postgres XC database, to use it you simply invoke psql utility and communicate with the database interactively. Coordinator does not store any data it simply stored the information as to where the data is located and directs you to that data. The actual data is stored on the data node component.

Data Node 

Data node is where the actual data is stored i.e. the data segment files, the write ahead log etc...Inorder to retrieve the data you interact with the coordinator component. The data is also replicated between the data nodes so you van see the same data no matter which coordinator you access.

GTM - Global Transaction Manager

With all those components scattered across multiple physical servers - there must be a mechanism in place that assured the consistent view of the data across the components, that is where the GTM comes in. 

Postgres XC Diagram 

Here above we have a Postgres XC configuration - SQL Processing and Data Storage components are installed on the same servers and we have a dedicated GTM.

Interestingly Postgres XC is termed as a "shared nothing" cluster but it does however share one component GTM.

Postgres XC Installation 

Having covered the basic architecture we can nor focus on Postgres XC installation and configuration that is not extremely trivial. 

Understanding the install process 

If you spend some time understanding the install process the actual installation is going to be much easier. Postgres XC comes as a source distribution and you wil need GNU make and GCC compiler to compile the source.On the positive side you do not need to compile all the separately - they all compiled in one shot and you'll end up with bunch of executables ready to use.

Downloading the software

Download the distribution from Postgres XC site I was using the release pgxc-v1.0.1. I'm also using amazon ec2 CentoOS 6 for the ami's.

Preparations for the install 

Since Postgres XC comes as a source distribution we will have to perform a lot of pre requisite steps prior to the installation.

Provision the partition and the file system for the future database 

Now days I almost exclusively use XFC filesystem for the databases - the reasons for which I have addressed in my other post Using amazon aws I simply provisioned the ebs volume and formatted it using the xfs filesystem, here is my ebs volume mounted and configured:

[root@ip-10-197-1-119 pgxc]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/xvde1            5.0G  1.5G  3.3G  31% /
tmpfs                 1.8G     0  1.8G   0% /dev/shm
/dev/xvdab            1.0T  376M  1.0T   1% /var/lib/pgsql

I usually use /var/lib/pgsql destination for all Postgres files this way if I need to increase the capacity of my ec2 all I have to do is take a snapshot of my ebs volume and mount it to a larger ec2 instance. 

Create postgres use and postgres group 

You can use varios tools to add users and group to the system, personally I just edit the configuration files:


Here is my product of editing the files:




ec2-user:x:222:500:EC2 Default User:/home/ec2-user:/bin/bash

Change the permissions of your Postgres volume to postgres user and postgres group

chown -R postgres:postgres /var/lib/pgsql

Now you are all set with the user and group and postgres home directory is /var/lib/mysql. I selected 666 useid and 666 group id for no particular reason. But I usually geep the gid and userid consistent.

Install the tools to compile the source using yum

yum install readline-devel  openssl-devel gcc xfsprogs  screen 

Based on my ami selection those are the only tools I needed to install, if your ami is missing any other tolls you cam easily detect it during the configure and make process. The installation proces here has to be repeated on all the nodes.

Open Ports required by Postgres XC

Postgres XC is using a bit more ports then standard Postgres that is 5432. For postgres XC you will need several ports:

5432 for Coordinator 
5433 for data node components 
6666 GTM port 

Technically you can select any unoccupied ports for those but I prefer to be consistent and use 5432 for the coordinator since this is the component you will be interacting with so it pay to have a familiar port. I then simply selected 5433 - the next available port for the data node. You need two different ports 5432 and 5433 since I'm planning to use the same server for the coordinator and data nodes. If you use the separate servers you can get away with 5432 for coordinator and data nodes. 

If it seems a bit confusing at this point - no worries it will get easier as we progress.

There is one more port that is needed the GTM port, i.e the port that Global Transaction Manager will be listening on. It is recommended that you do use a separate box for GTM so you still have it in case you loose one of the coordinator/data nodes. I selected 6666 port for the GTM

On amazon ec2 be sure to open those port in a security group that you assigned to your Postgres XC instances. Also I notices that on some ami's iptables are enable by default and even though you opened the ports using security group be sure to run /etc/init.d/iptables stop to stop the internal firewall.

Compile the source

Switch to the source directory of your installation and issue the compile commands:

cd /var/lib/pgsql/pgxe_install/pgxc   

Next run configure command


Assuming that all the checks complete next you run the gmake:

./gmake install

That is all, the files will be compiled and installed in /usr/local/pgsql - that is the default directory where the executables will be generated. To make things easier I usually copy all the executables to /var/lib/pgsql directory to keep all the files on my ebs volume, this is the final product of the directory structure assuming the compile completed:

bin  include  lib  share

bin directory will contain all of the executables.

On my particular ami the entire installation compiled flawlessly, this is the ami I used ami-2893b36d. 


Assuming you managed to resolve the dependencies and compiled the executables we can proceed to the 

Important note

Throughout the installation documentation Postgres XC is referring to the "node name" - its important to understand that the "nodename" is not a hostname of your boxes but instead its a logical node name convention.  Each component in Postgres XC get its own logical nodename. The "node" in Postgres XC terms is actually an instance of Postgres with its own data directory. So if you for example using the same box for the coordinator and the data nodes you will end up with two instances of Postgres on the same box, hence the separate data directory destination for each and separate port numbers. 

For example I called my three nodes - alpha, beta, gamma and delta. Since I have there physical boxes with two components on each physical server - coordinator and the data node I need separate node names for each component per box, I usually prefix "c" to the coordinator node names and leave the data nodes with actual name:

On first physical box I have:
calpha - for the nodename of coordinator
alpha   - for the node name of data node 

On the second 

cbeta - for the nodename of coordinator notice the "c" prefix to identify the coordinator 
beta   - for the node name of data node 

And on the third box I have:


GTM gets the node name delta and there is no prefixes since I have one coordinator per box.

All and all two logical names per box, in summary you will need six logical node names for three physical boxes:


Coordinator Configuration

As you remember coordinator is the SQL interface to your Postgres XC, coordinator does not actually store 
any of your data but it does need a dedicated data directory the sore the catalog. My standard installation 
directory is 


I then selected cdata directory for the to store the coordinator data:


I used cdata to distinguish it from the data node directory - data, "c" prefix help to identify the coordinator 

To initialise the coordinator data directory standard postgres commands will be used 
initdb -D /usr/local/pgsql/cdata --nodename

Once this is complete and the directory /var/lib/pgsql/cdata is created we can configure the parameters. Please node that I'm using "cdata" for the directory name not to mix it with data node directory data:

cdata  data  

Inside the cdata directory that was created there is standard postgresql.conf file with Postgres XC special parameters. There are myriad parameters to Postgres so we will focus only on those to get the database up and running. 

When running the initial configuration initdb -D /usr/local/pgsql/cdata --nodename you can select anything for the node name since we will modify it later using the configuration files. There is a GTM section in the parameter file and that is the section we need:




gtm_host = ''                       # Host name or address of GTM


gtm_port = 6666                                     # Port of GTM


pgxc_node_name = 'calpha'                    # Coordinator or Datanode name

From here the configuration is simple - every coordinator need its own logical node name "calpha" and most importantly the ip address and the port number of GTM. 

That is it with the coordinator node.

Data Node configuration

Data node configuration is very similar to coordinator, with the difference that the data node will actually store your data and in turn the data node will replicate that data between the data nodes.   The first step is to actually initialize the data node directory just like we did with the coordinator nodes.
To initialize the data node directory use the same command used to initialize the coordinator node but be sure not to use the same directory name:

initdb -D /usr/local/pgsql/data --nodename

If you noticed I "data" to name my data node directory that is different from "cdata" name that I used for the coordinator node. 

When it all completed you should have two directories with data and configuration files 

/var/lib/pgsql/cdata   - coordinator directory
/var/lib/pgsql/data    -  data node directory

Now since /var/lib/pgsql/data  is created it will contain its own configuration file - postgresql.conf.  Just like the coordinator the data node need to know the IP address and ort of the GTM server:


gtm_host = ''                       # Host name or address of GTM
gtm_port = 6666                                     # Port of GTM
pgxc_node_name = 'alpha'                    # Coordinator or Datanode name

The only difference from the coordinator node here is the node name "alpha" since I used "calpha" to designate the coordinator node.

In addition you will need to edit pg_hba.conf file for the data node. Postgres XC is replicating the data between the data nodes - it will ned  the capability to connect to each data node.

host    all              postgres           trust
host    all              postgres            trust
host    all              postgres           trust
host    all              postgres            trust

I added those to each pg_hba.conf  so there are no connectivity issues between the nodes. 

GTM Configuration 

GTM configuration is the easiest part since it resides on its own server. You will need to initialize the GTM directory just like with the coordinator and the data nodes:

initgtm -Z gtm -D /var/lib/pgsql/data_gtm

Except this time we use a special GTM utility initgtm. Once initialized open gtm.conf file and edit the parameters:

nodename = 'delta'                              # Specifies the node name.
listen_addresses = '*'                  # Listen addresses of this GTM.
port = 6666                             # Port number of this GTM.

As you can see - very simple, the node name interface to listen to "*" (all) and the port number. Simple as that. 

After this we can start the GTM node. The GTM node must be started prior to any other nodes since all of them need access to GTM.

To start the GTM node use the following command:

gtm -D /var/lib/pgsql/data_gtm

Simple as that.  Once the GTM is started you can view the gtm.log file:

1:140259528533760:2013-01-26 00:14:10.551 UTC -LOG:  Assigning new transaction ID = 49644
LOCATION:  GTM_GetGlobalTransactionIdMulti, gtm_txn.c:581
1:140259528533760:2013-01-26 00:14:10.551 UTC -LOG:  Sending transaction id 49644
LOCATION:  ProcessBeginTransactionGetGXIDCommand, gtm_txn.c:1172
1:140259528533760:2013-01-26 00:14:10.552 UTC -LOG:  Received transaction ID 49644 for snapshot obtention
LOCATION:  ProcessGetSnapshotCommand, gtm_snap.c:307
1:140259528533760:2013-01-26 00:14:10.555 UTC -LOG:  Committing transaction id 49644
LOCATION:  ProcessCommitTransactionCommand, gtm_txn.c:1591
1:140259528533760:2013-01-26 00:14:10.556 UTC -LOG:  Cleaning up thread state
LOCATION:  GTM_ThreadCleanup, gtm_thread.c:265

Starting the Postgres XC 

Now since the GTM node running we can start the rest of our cluster, remember that I have three nodes. That means I need to start all the components on each node - coordinator and data node.

Starting the data node 

To start  the data node use the following command:

postgres -X -D /var/lib/pgsql/data

-X mean the data node and -D means the directory path, be sure to point the data node to the right directory as you remember we have two of them "cdata" for coordinator and "data"  for data node. 

Starting the coordinator node 

To start the coordinator node use:

postgres -C -D /var/lib/pgsql/cdata

-C mean the coordinator and -D is the directory we use "cdata" for the coordinator directory. 

At this point you should have toe instances of potgres running on your box:

Coordinator out of /var/lib/pgsql/cdata directory and listening on 5432 port and data node out of /var/lib/pgsql/data and listening on 5433 port.

postgres  1322  1298  0 Jan22 pts/1    00:00:03 postgres -X -D /var/lib/pgsql/data
postgres  1324  1322  0 Jan22 ?        00:00:27 postgres: writer process          
postgres  1325  1322  0 Jan22 ?        00:00:24 postgres: wal writer process      
postgres  1326  1322  0 Jan22 ?        00:00:06 postgres: autovacuum launcher process   

postgres  1363  1340  0 Jan22 pts/2    00:00:18 postgres -C -D /var/lib/pgsql/cdata
postgres  1365  1363  0 Jan22 ?        00:00:00 postgres: pooler process           
postgres  1366  1363  0 Jan22 ?        00:00:27 postgres: writer process           
postgres  1367  1363  0 Jan22 ?        00:00:23 postgres: wal writer process

Now we need to repeat the process all nodes that house the coordinator and data nodes, as you recall we have three nodes so we will need to start the coordinator and data node on each box.

When all the nodes started correctly you will see the register.node file on the GTM box, in /var/lib/pgsql/gtm_data directory. 


Are we there yet ?

I know what you thinking there are few more steps and we have a working Postgres XC three node database.

Registering the nodes

Each node - if you recall the "node" means a logical name of the Postgres XC components will need to be registered so they know of each others existence. 

CREATE NODE cbeta  WITH (TYPE = 'coordinator', HOST = '', PORT = 5432);
CREATE NODE beta  WITH (TYPE = 'datanode', HOST = '', PORT = 5433);

Those command need to be executed on each physical server for each data note - that means if you have three servers you will need to do it three times on each box.

Testing the install

Once all the nodes registered we can go ahead and create the database to test that the new installation is working login to any coordinator and issue create database command:

psql# create database test

Once the database is created you should be able to access it from any coordinator node.


Postgres XC looks like a very promising project, in my next post will throw some performance tests at my new install and compare it to the standalone Postgres.


geetha said...

wonderful information, I had come to know about your blog from my friend nandu , hyderabad,i have read atleast 7 posts of yours by now, and let me tell you, your website gives the best and the most interesting information. This is just the kind of information that i had been looking for, i'm already your rss reader now and i would regularly watch out for the new posts, once again hats off to you! Thanks a ton once again, Regards, MySql online trainingamong the MySql in Hyderabad. Classroom Training in Hyderabad India

Harsimranjit Singh Kler said...

GTM - Global Transaction Manager....One Node is again point of failure thats what i understand?