Pages

Popular Posts

Wednesday, April 10, 2013

MariaDB Dynamic Columns

MariaDB  Dynamic columns


Very interesting new feature included in MariaDB 5.3 onwards and I hope it will get enough attention for continued development and improvement - enter dynamic columns. What does it mean - dynamic column? In the simple terms this feature will allow you to create a column for any row in the table. The actual data will be stored in blobs and this feature also comes with cool set of functions to manipulate the data. If this does not make any sense at this point - stick around as the example for practical use will make it easy to grasp.

For example, you have typical analytics scenario  - you have a single large fact table (fact or measures tables are usually the large ones) and a collection of smaller dimension tables (aka attributes), all and all classical star schema that contains:

Say you are a marketing company to the mobile phone industry and you have an app that captures the data in the following tables:

call log table -       the fact table that stored all calls
device table -         dimension table that stores all smart phone devices (HTC, iPhone etc...)
region table -         country etc...
carrier table -         stores the carrier (AT&T, Verizon etc....)
os table       -         stores all operating systems (iOS, Android etc...)


All and all five tables to get the data in and five tables to join when you want to report on the data. Not too bad you'd say and I normally would agree, but how about the possibility that those attributes (or dimension) could be stored in the single table along with the fact table? Yes you can try to de normalize the data by cramming the attributes into the fact table, but there is a better way.

Enter dynamic columns in MariaDB - so now you can store those attributes in the same table thus avoiding the table joints.  Those who know can bring the enum and set data types that have the similar function, but enum and set are limited by the volume of the data that you can store in them - enum can have 65,535  elements stored in it and set has 64 max members that can be stored in it.

The dynamic columns are free from set and enum data types limitations - one can store up to 1 GB in a single column. Pretty cool huh? 

Now to the practical usage - that is of very simple and elegant design.

Creating tables with dynamic columns

Creating tables with dynamic columns is easy:



CREATE TABLE call_log (
  call_id int primary key,
  incoming_number varchar(11) DEFAULT NULL,
  outgoing_number  varchar(11) DEFAULT NULL,
  dynamic_cols blob
) ENGINE=InnoDB


The above statement created a table with dynamic column. Now we can store the data in the dynamic column we created.


insert into call_log values (1,'8888888888', '9999999999', COLUMN_CREATE('carrier', 'AT&T'));
insert into call_log values (2,'8888888888', '9999999999', COLUMN_CREATE('2', 'Verizon'));        
insert into call_log values (3,'8888888888', '9999999999', COLUMN_CREATE('carrier', 'Tmobile'));
   


Now will insert the device values:


insert into call_log values (4,'8888888888', '9999999999', COLUMN_CREATE('device', 'iPhone'));
insert into call_log values (5,'8458888888', '9999779999', COLUMN_CREATE('device', 'Blackberry'));


Now lets insert some operating system values into our call log table


insert into call_log values (6,'8458888888', '9999779999', COLUMN_CREATE('os', 'iOS'));

insert into call_log values (7,'8458888888', '9999779999', COLUMN_CREATE('os', 'Android'));



Now lets insert region data:


insert into call_log values (8,'8458888888', '9999779999', COLUMN_CREATE('region', 'North West'));


insert into call_log values (9,'8458888888', '9999779999', COLUMN_CREATE('region', 'South East'));


Now we can insert a series of values into the dynamic column:

insert into call_log values (7,'8458888888', '9999779999', COLUMN_CREATE('1', 'Android', '2', 'iPhone', '3', 'Android', '4', 'South East'));

Cool huh?

So how do we query for the data then? Very easy using functions - 



select call_id, COLUMN_GET(dynamic_cols, '3' as char) as color from call_log;
+---------+---------+
| call_id | color      |
+---------+---------+
|       1 | Android    |
|       2 | iOS           |
|       3 | Symbian   |
|       4 | iOS           |
|       5 | NULL       |
|       6 | iOS           |
|       7 | Android    |
|       8 | NULL       |
|       9 | NULL       |
|      10 | Android   |
+---------+---------+
10 rows in set (0.00 sec)

The above selected the call_id and all operating systems (3).

Important Note: MariaDB 5.3 and MariaDB 5.5 only support number as a column name MariaDB 10.x supports characters as columns name.

Limitations.

Presently there is not support for indexing dynamic columns, but as usage will increase so will the support for indexing.

Hope you found it interesting, for more visit MariaDB 


Be sure to visit accelerationdb.com





Thursday, March 28, 2013

MySQL slave delay how to

The concept of delayed replicated slave is nothing new - Oracle (TM) for example had it for years with the Data Guard. So what is it in the delaying the replicated database slave, how to use it and how to implement it for MySQL - will discuss those topics in the posting.

Most of the replicated MySQL configurations are - one master and several replicated slaves, for the sake of simplicity I'm not going to get into the discussion on shards etc... And in most cases the struggle is to keep the replication up to date.  But there are scenarios  when delaying the replicated slave is actually beneficial.

Delaying MySQL replicated slave is a very valuable "undo" options for the databases. For example if someone accidentally drops one of the MySQL  databases or tables those databases and tables can be easily recovered from the delayed MySQL slave because the drop statement did not get the chance to propagate to the delayed replicated slave. It is also a very valuable security option in case someone hacked into the master MySQL database and dropped the database objects.

Having a MySQL replicated configuration and not using the delayed slave is simply hard to justify.

So how do we go about delaying the replicated slave ?

There are two options:

1. MySQL 5.6 (that is one from Oracle TM)  that is of you are using MySQL 5.6

MySQL release 5.6 from Oracle TM actually has a new command that will force the slave to delay:


CHANGE MASTER TO MASTER_DELAY = N;


The "N" stands for the number of seconds.

2. Percona (TM) toolkit

Percona (TM) developed a tool kit that can be downloaded from Percona site http://www.percona.com/software/downloads/.  There are several tools in the download package but we will focus on pt-slave-delay

pt-slave delay is essentially a command line tool and its also very simple to use:

pt-slave-delay --delay=120m, u=root, p=yourrootpassword, h=hostaname

Simple as that, there are a lot of options to the command but all you need if running on the same host as the database all you need is the root userid and password.

Once you issue the command you'll see the following output:


013-03-28T14:36:53 slave stopped at master position mysql-bin.002116/61079053
2013-03-28T14:37:53 slave stopped at master position mysql-bin.002116/70028546
2013-03-28T14:38:53 slave stopped at master position mysql-bin.002116/78913700
2013-03-28T14:39:53 slave stopped at master position mysql-bin.002116/88378080
2013-03-28T14:40:53 slave stopped at master position mysql-bin.002116/101285536
2013-03-28T14:41:53 slave stopped at master position mysql-bin.002116/110519229

What pt-slave-delay actually doing is periodically stopping the slave using "slave stop" command and then starting it again using "slave start" command.

Simple as that.

I usually run pt-slave-delay in the background screen session so I can monitor the progress. Very simple and elegant.

Now there is no reason for you not to delay. Also pt-slave delay works fine with most of MySQL distributions and MariaDB.



Monday, January 28, 2013

Postgres XC - explained

Postgres XC explained 

Users demand faster processing and as the processing and reports get faster so do the expectation for even faster performance and scalability. Now days its unthinkable not to deliver the performance  and availability.

All of those challenges require a constant search/development of new and improves solutions. Enter Postgres XC just when you were thinking that RDBMS is dead it comes back with the vengeance. So what is exactly Postgres XC and why should anyone care? 

Postgres XC shares a common name with the Postgres RDBMS but this is where the similarities stop, Postgres is a standalone database, Postgres XC on the other hand is a distributed RDBMS in a "share nothing" configuration. With this publication I'll make an attempt to explain Postgres XC in plain English, the technology is very promising and the complexity combined with the lack of proper documentation usually are the very strong barriers to any technology adoption. Just look at MySQL NDB cluster - great technology that was kept from the mainstream adoption.

With all the promises of NoSQL databases - never mind how you twist the name: Not only SQL or No to SQL, the reality remains - RDBMS is far more superior and simpler to extract and analyse your data.   SQL language was brilliant invention - very easy to use and learn, the only barriers that kept RDBMS from proliferating are the performance and scalability, Postgres XC seems to be closing this gap while delivering the familiarity of SQL language.

Distributed Database and "Share Nothing" cluster 

Postgres XC is a "distributed" database, so what does it mean. Well in a simple terms that means that various databases components are scattered about several physical severs. What are the typical RDBMS database components:

1. SQL Processing - or SQL interactive layer 
2. Data Storage - i.e. the data files 
3. Some sort of transactions coordinator 

In typical single server DBMS all of the above components are residing together on the same server, "distributed databases" on the other hand scatter its components on the several nodes, for example the SQL processing engine could be sitting on a separate server and in turn communicating with "data storage" that are sitting on the other server.  Now why bother scattering things up if you could neatly place then on the same box? Scalability - as workload increases and more users are pounding your site with poorly written SQL eventually you ran out of even the biggest box capacity and the need to "distribute" the processing will arise.

There is another big bonus to "scattering"  the components - you effectively eliminate the SPOF  aka single point of failure, when some of the components fail the others are continuing to function thus eliminating the possibility of your site being down.

Here is the example of typical RDBMS - all components reside together on the same box:


In contrast "distributed" database has its components on various servers - SQL processing and Data nodes for example:

This is somewhat simplistic representation of the "distributed" databases but it does serve the purpose to communicate the logical design.

Distributed databases are nothing new and some of them been around for a number of years for example, Oracle RAC, MySQL NDB.

Share Nothing and Distributed database

Postgres XC is a share-nothing and distributed database, we already covered the subject of the "distributed database" lets continue and decipher  the "share nothing". In the fust part we discussed the database components the SQL Engine, Data Storage etc... "Share Nothing" database design means that none of the databases components are shared among the servers. Postgres XC not only can scatter the database components across the nodes it can also have multiple node housing multiple components:


Here we have a "distributed" and "share nothing" database above - the database components are distributed among the servers and the components are in turn not shared. Here you have it - distributed and share nothing database. You probably noticed in distributed share nothing database all the data is accessible from any server since the data is usually synchronously replicated among the nodes.  

One of the great example of distributed and share nothing database is MySQL NDB cluster it looks very similar to the image above and is synchronously replication the data between the nodes. 

Postgres XC Share Nothing Distributed Database 

Postgres XC - developed and maintained by Postgres XC Development Group                    http://postgres-xc.sourceforge.net/  is a distributed share nothing database that consists of the following components:

1. Coordinator 
2. Data storage
3. DTM - Distributed  Transaction coordinator

Coordinator

Coordinator is essentially an SQL interface to your data i.e. this is the components you use to access the data in Postgres XC database, to use it you simply invoke psql utility and communicate with the database interactively. Coordinator does not store any data it simply stored the information as to where the data is located and directs you to that data. The actual data is stored on the data node component.

Data Node 

Data node is where the actual data is stored i.e. the data segment files, the write ahead log etc...Inorder to retrieve the data you interact with the coordinator component. The data is also replicated between the data nodes so you van see the same data no matter which coordinator you access.

GTM - Global Transaction Manager

With all those components scattered across multiple physical servers - there must be a mechanism in place that assured the consistent view of the data across the components, that is where the GTM comes in. 


Postgres XC Diagram 


Here above we have a Postgres XC configuration - SQL Processing and Data Storage components are installed on the same servers and we have a dedicated GTM.

Interestingly Postgres XC is termed as a "shared nothing" cluster but it does however share one component GTM.

Postgres XC Installation 

Having covered the basic architecture we can nor focus on Postgres XC installation and configuration that is not extremely trivial. 

Understanding the install process 

If you spend some time understanding the install process the actual installation is going to be much easier. Postgres XC comes as a source distribution and you wil need GNU make and GCC compiler to compile the source.On the positive side you do not need to compile all the separately - they all compiled in one shot and you'll end up with bunch of executables ready to use.

Downloading the software

Download the distribution from Postgres XC site http://postgres-xc.sourceforge.net/ I was using the release pgxc-v1.0.1. I'm also using amazon ec2 CentoOS 6 for the ami's.

Preparations for the install 

Since Postgres XC comes as a source distribution we will have to perform a lot of pre requisite steps prior to the installation.

Provision the partition and the file system for the future database 

Now days I almost exclusively use XFC filesystem for the databases - the reasons for which I have addressed in my other post http://alexalexander.blogspot.com/2011/12/mariadb-or-mysql-ext3-ext4-jounaling.html. Using amazon aws I simply provisioned the ebs volume and formatted it using the xfs filesystem, here is my ebs volume mounted and configured:


[root@ip-10-197-1-119 pgxc]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/xvde1            5.0G  1.5G  3.3G  31% /
tmpfs                 1.8G     0  1.8G   0% /dev/shm
/dev/xvdab            1.0T  376M  1.0T   1% /var/lib/pgsql


I usually use /var/lib/pgsql destination for all Postgres files this way if I need to increase the capacity of my ec2 all I have to do is take a snapshot of my ebs volume and mount it to a larger ec2 instance. 

Create postgres use and postgres group 

You can use varios tools to add users and group to the system, personally I just edit the configuration files:

/etc/group
/etc/passwd 

Here is my product of editing the files:

/etc/group 

tcpdump:x:72:
slocate:x:21:
ec2-user:x:500:
screen:x:84:
postgres:x:666:

/etc/passwd

tcpdump:x:72:72::/:/sbin/nologin
ec2-user:x:222:500:EC2 Default User:/home/ec2-user:/bin/bash
postgres:x:666:666:postgres:/var/lib/pgsql:/bin/bash

Change the permissions of your Postgres volume to postgres user and postgres group

chown -R postgres:postgres /var/lib/pgsql

Now you are all set with the user and group and postgres home directory is /var/lib/mysql. I selected 666 useid and 666 group id for no particular reason. But I usually geep the gid and userid consistent.

Install the tools to compile the source using yum


yum install readline-devel  openssl-devel gcc xfsprogs  screen 

Based on my ami selection those are the only tools I needed to install, if your ami is missing any other tolls you cam easily detect it during the configure and make process. The installation proces here has to be repeated on all the nodes.

Open Ports required by Postgres XC

Postgres XC is using a bit more ports then standard Postgres that is 5432. For postgres XC you will need several ports:

5432 for Coordinator 
5433 for data node components 
6666 GTM port 

Technically you can select any unoccupied ports for those but I prefer to be consistent and use 5432 for the coordinator since this is the component you will be interacting with so it pay to have a familiar port. I then simply selected 5433 - the next available port for the data node. You need two different ports 5432 and 5433 since I'm planning to use the same server for the coordinator and data nodes. If you use the separate servers you can get away with 5432 for coordinator and data nodes. 

If it seems a bit confusing at this point - no worries it will get easier as we progress.

There is one more port that is needed the GTM port, i.e the port that Global Transaction Manager will be listening on. It is recommended that you do use a separate box for GTM so you still have it in case you loose one of the coordinator/data nodes. I selected 6666 port for the GTM

On amazon ec2 be sure to open those port in a security group that you assigned to your Postgres XC instances. Also I notices that on some ami's iptables are enable by default and even though you opened the ports using security group be sure to run /etc/init.d/iptables stop to stop the internal firewall.

Compile the source

Switch to the source directory of your installation and issue the compile commands:

cd /var/lib/pgsql/pgxe_install/pgxc   

Next run configure command

./configure 

Assuming that all the checks complete next you run the gmake:

./gmake install

That is all, the files will be compiled and installed in /usr/local/pgsql - that is the default directory where the executables will be generated. To make things easier I usually copy all the executables to /var/lib/pgsql directory to keep all the files on my ebs volume, this is the final product of the directory structure assuming the compile completed:

bin  include  lib  share

bin directory will contain all of the executables.

On my particular ami the entire installation compiled flawlessly, this is the ami I used ami-2893b36d. 

Configuration

Assuming you managed to resolve the dependencies and compiled the executables we can proceed to the 
configuration. 

Important note

Throughout the installation documentation Postgres XC is referring to the "node name" - its important to understand that the "nodename" is not a hostname of your boxes but instead its a logical node name convention.  Each component in Postgres XC get its own logical nodename. The "node" in Postgres XC terms is actually an instance of Postgres with its own data directory. So if you for example using the same box for the coordinator and the data nodes you will end up with two instances of Postgres on the same box, hence the separate data directory destination for each and separate port numbers. 

For example I called my three nodes - alpha, beta, gamma and delta. Since I have there physical boxes with two components on each physical server - coordinator and the data node I need separate node names for each component per box, I usually prefix "c" to the coordinator node names and leave the data nodes with actual name:

On first physical box I have:
calpha - for the nodename of coordinator
alpha   - for the node name of data node 

On the second 

cbeta - for the nodename of coordinator notice the "c" prefix to identify the coordinator 
beta   - for the node name of data node 

And on the third box I have:

cgamma
gamma

GTM gets the node name delta and there is no prefixes since I have one coordinator per box.

All and all two logical names per box, in summary you will need six logical node names for three physical boxes:

calpha
alpha
cbeta
beta
cgamma
gamma

Coordinator Configuration

As you remember coordinator is the SQL interface to your Postgres XC, coordinator does not actually store 
any of your data but it does need a dedicated data directory the sore the catalog. My standard installation 
directory is 

/var/lib/pgsql

I then selected cdata directory for the to store the coordinator data:

/var/lib/pgsql/cdada

I used cdata to distinguish it from the data node directory - data, "c" prefix help to identify the coordinator 
directory.

To initialise the coordinator data directory standard postgres commands will be used 
initdb -D /usr/local/pgsql/cdata --nodename

Once this is complete and the directory /var/lib/pgsql/cdata is created we can configure the parameters. Please node that I'm using "cdata" for the directory name not to mix it with data node directory data:

cdata  data  

Inside the cdata directory that was created there is standard postgresql.conf file with Postgres XC special parameters. There are myriad parameters to Postgres so we will focus only on those to get the database up and running. 

When running the initial configuration initdb -D /usr/local/pgsql/cdata --nodename you can select anything for the node name since we will modify it later using the configuration files. There is a GTM section in the parameter file and that is the section we need:


#------------------------------------------------------------------------------

# GTM CONNECTION

#------------------------------------------------------------------------------



gtm_host = '54.241.19.61'                       # Host name or address of GTM

                                        

gtm_port = 6666                                     # Port of GTM

                                 

pgxc_node_name = 'calpha'                    # Coordinator or Datanode name

From here the configuration is simple - every coordinator need its own logical node name "calpha" and most importantly the ip address and the port number of GTM. 

That is it with the coordinator node.

Data Node configuration

Data node configuration is very similar to coordinator, with the difference that the data node will actually store your data and in turn the data node will replicate that data between the data nodes.   The first step is to actually initialize the data node directory just like we did with the coordinator nodes.
                                     
To initialize the data node directory use the same command used to initialize the coordinator node but be sure not to use the same directory name:

initdb -D /usr/local/pgsql/data --nodename

If you noticed I "data" to name my data node directory that is different from "cdata" name that I used for the coordinator node. 

When it all completed you should have two directories with data and configuration files 

/var/lib/pgsql/cdata   - coordinator directory
/var/lib/pgsql/data    -  data node directory

Now since /var/lib/pgsql/data  is created it will contain its own configuration file - postgresql.conf.  Just like the coordinator the data node need to know the IP address and ort of the GTM server:

#------------------------------------------------------------------------------
# GTM CONNECTION
#------------------------------------------------------------------------------

gtm_host = '54.241.19.61'                       # Host name or address of GTM
                                        
gtm_port = 6666                                     # Port of GTM
                                 
pgxc_node_name = 'alpha'                    # Coordinator or Datanode name

The only difference from the coordinator node here is the node name "alpha" since I used "calpha" to designate the coordinator node.

In addition you will need to edit pg_hba.conf file for the data node. Postgres XC is replicating the data between the data nodes - it will ned  the capability to connect to each data node.

host    all              postgres        50.18.116.205/32           trust
host    all              postgres        54.241.19.58/32            trust
host    all              postgres        50.18.116.169/32           trust
host    all              postgres        54.241.19.61/32            trust

I added those to each pg_hba.conf  so there are no connectivity issues between the nodes. 

GTM Configuration 

GTM configuration is the easiest part since it resides on its own server. You will need to initialize the GTM directory just like with the coordinator and the data nodes:

initgtm -Z gtm -D /var/lib/pgsql/data_gtm

Except this time we use a special GTM utility initgtm. Once initialized open gtm.conf file and edit the parameters:

------------------------------------------------------------------------------
# GENERAL PARAMETERS
#------------------------------------------------------------------------------
nodename = 'delta'                              # Specifies the node name.
                                       
listen_addresses = '*'                  # Listen addresses of this GTM.
                                       
port = 6666                             # Port number of this GTM.

As you can see - very simple, the node name interface to listen to "*" (all) and the port number. Simple as that. 

After this we can start the GTM node. The GTM node must be started prior to any other nodes since all of them need access to GTM.


To start the GTM node use the following command:

gtm -D /var/lib/pgsql/data_gtm

Simple as that.  Once the GTM is started you can view the gtm.log file:

1:140259528533760:2013-01-26 00:14:10.551 UTC -LOG:  Assigning new transaction ID = 49644
LOCATION:  GTM_GetGlobalTransactionIdMulti, gtm_txn.c:581
1:140259528533760:2013-01-26 00:14:10.551 UTC -LOG:  Sending transaction id 49644
LOCATION:  ProcessBeginTransactionGetGXIDCommand, gtm_txn.c:1172
1:140259528533760:2013-01-26 00:14:10.552 UTC -LOG:  Received transaction ID 49644 for snapshot obtention
LOCATION:  ProcessGetSnapshotCommand, gtm_snap.c:307
1:140259528533760:2013-01-26 00:14:10.555 UTC -LOG:  Committing transaction id 49644
LOCATION:  ProcessCommitTransactionCommand, gtm_txn.c:1591
1:140259528533760:2013-01-26 00:14:10.556 UTC -LOG:  Cleaning up thread state
LOCATION:  GTM_ThreadCleanup, gtm_thread.c:265

Starting the Postgres XC 

Now since the GTM node running we can start the rest of our cluster, remember that I have three nodes. That means I need to start all the components on each node - coordinator and data node.

Starting the data node 

To start  the data node use the following command:

postgres -X -D /var/lib/pgsql/data

-X mean the data node and -D means the directory path, be sure to point the data node to the right directory as you remember we have two of them "cdata" for coordinator and "data"  for data node. 

Starting the coordinator node 

To start the coordinator node use:

postgres -C -D /var/lib/pgsql/cdata

-C mean the coordinator and -D is the directory we use "cdata" for the coordinator directory. 

At this point you should have toe instances of potgres running on your box:

Coordinator out of /var/lib/pgsql/cdata directory and listening on 5432 port and data node out of /var/lib/pgsql/data and listening on 5433 port.

postgres  1322  1298  0 Jan22 pts/1    00:00:03 postgres -X -D /var/lib/pgsql/data
postgres  1324  1322  0 Jan22 ?        00:00:27 postgres: writer process          
postgres  1325  1322  0 Jan22 ?        00:00:24 postgres: wal writer process      
postgres  1326  1322  0 Jan22 ?        00:00:06 postgres: autovacuum launcher process   

postgres  1363  1340  0 Jan22 pts/2    00:00:18 postgres -C -D /var/lib/pgsql/cdata
postgres  1365  1363  0 Jan22 ?        00:00:00 postgres: pooler process           
postgres  1366  1363  0 Jan22 ?        00:00:27 postgres: writer process           
postgres  1367  1363  0 Jan22 ?        00:00:23 postgres: wal writer process

Now we need to repeat the process all nodes that house the coordinator and data nodes, as you recall we have three nodes so we will need to start the coordinator and data node on each box.

When all the nodes started correctly you will see the register.node file on the GTM box, in /var/lib/pgsql/gtm_data directory. 

????cbeta8*ip-10-197-0-136.us-west-1.compute.internal/var/lib/pgsql/cdata????????calpha8*ip-10-197-1-119.us-west-1.compute.internal/var/lib/pgsql/cdata????????cgamma8*ip-10-197-0-182.us-west-1.compute.internal/var/lib/pgsql/cdata????????alpha9*ip-10-197-1-119.us-west-1.compute.internal/var/lib/pgsql/data????????beta9*ip-10-197-0-136.us-west-1.compute.internal/var/lib/pgsql/data????????cgamma????????gamma9*ip-10-197-0-182.us-west-1.compute.internal/var/lib/pgsql/data????????cg

Are we there yet ?

I know what you thinking there are few more steps and we have a working Postgres XC three node database.

Registering the nodes

Each node - if you recall the "node" means a logical name of the Postgres XC components will need to be registered so they know of each others existence. 


CREATE NODE cbeta  WITH (TYPE = 'coordinator', HOST = '54.241.19.58', PORT = 5432);
CREATE NODE beta  WITH (TYPE = 'datanode', HOST = '54.241.19.58', PORT = 5433);

Those command need to be executed on each physical server for each data note - that means if you have three servers you will need to do it three times on each box.

Testing the install

Once all the nodes registered we can go ahead and create the database to test that the new installation is working login to any coordinator and issue create database command:

psql# create database test

Once the database is created you should be able to access it from any coordinator node.


Conclusion 

Postgres XC looks like a very promising project, in my next post will throw some performance tests at my new install and compare it to the standalone Postgres.































Wednesday, January 16, 2013

Postgres replication for MySQL DBA's

Postgres replication for MySQL DBA's

Working a with Postgers and MySQL replication I noticed that many MySQL DBA's have hard time grasping Postgres replication and how it works,  you may ask why would you need to know how to setup and use Postgres replication - but lets face it Postgres is becoming ever more popular and many MySQL shops also have Postgres databases. Overall a lot of startups are using a collection of databases  that serve various purposes. I"ll attempt to explain Postgres replication in plain English and use MySQL terms to help adsorb the seemingly complex subject. 

Postgres  "streaming replication" was introduced since Rel. 9.0 of Postgres and is a very useful addition that many of us were waiting for years, personally I strongly believe that the ack of replication kept Postgres from wide adoption unlike MySQL that had the replication from early releases. Replication is a great tool since it allows to scale and shard your databases horizontally i.e. distributed computing. 

Postgres documentation on replication is complex and not very intuitive in addition the terminology is confusing adding to the difficulty of creating replicated slaves in Postgres. In reality Postgres replication is actually very simple and flexible, once you get over the terminology. 

Types of Postgres replication

The trouble with Postgres replication is that it has just too many of them, Slony, Bucardo, Londiste and of course now added "streaming replication".  The first step is to choose one that fits the particular use case. In this posting I will focus on Postgres native build-in "streaming replication".

Trigger Based replication

Slony
Londiste
Bucardo

Log Based Replication

Streaming Replication - this is the replication we are going to focus on, this replication is build-in Postgers release since 9.0 and is closely resembling MySQL replication that was log based from the get go.  But that is where the similarities stop.

Postgres "streaming replication"

The first confusion comes from the term "streaming replication" , all replications are essentially "streaming" i.e. the master is "streaming" sets of record to the slave, so adding "streaming" to the replication term did  not  help the easy of adoption.  To be fair Postgres named its replication "streaming" to distance itself from the early attempts at replication that is before rel. 9.0. There are references in Postgres documentation to "hot" and "warm" standby, the "hot" standby being the "streaming" one. All and all "streaming replication" means that Postgres slave database can now read directly from the stream of data from the master database, as compared to "warm" standby when the logs had to be transferred to the slave database that was running in the continuos recovery mode applying those logs.

Anyhow "streaming replication" is the one that you want to use, with no further delay will proceed on to how to set it up.

What do you need to setup Postgres replication 

1. Master database
2. Slave database
3. Archiving of the logs - this is the same as enabling binlogs on MySQL
4. Replication user - this is the same as replication user on MySQL
5. Open database ports 5432 between the master and the slave vs. 3306 on MySQL
6. Copy the Master database backup to the slave
7.  Start the replication and enjoy the benefits

Master Database 

In order for the master Postgres database to stream the data to the slave it must be configured just like MySQL - the bulk of the process involves enabling the archiving  i.e. "binlogs" and permission on the master database.

Setup replication user and permissions:

Postgres security unlike MySQL is using combination of database permissions and configuration files to enable the connectivity to the databases. The main file to setup the replication from user perspective is /var/lib/pgsql/9.1/main/pg_hba.conf


Add replication user to pg_hba.conf file:

host replication postgres 172.17.28.57/32 trust

What it is in the line above we are adding the replication user to the file and instruct the security to 
allow the user to connect from 172.17.28.57/32 - that is the replication slave.


Add replication user to the Postgres database:



CREATE USER replicator WITH SUPERUSER ENCRYPTED PASSWORD 'secret';

In summary - to create replication user you need to perform two steps 1. Add the use to pg_hba.conf 
and add the user to the database using database grants. This is a bit unlike MySQL where all you have 
to do is to grant "replication slave" privileges to the slave. All and all one extra step in Postgres.

Now since the user is added we can proceed to the rest of the configuration, we are working on the 
master database so far, will get to the slave very soon.

The next step we need to enable archiving i.e. enable binlogging in MySQL terms.


Enable Master database archving


There are several parameters in Postgers main configuration file postgresql.conf that control the 
replication and archiving:


listen_addresses = '*'

The above parameter tells Postgers to listen on all available IP addresses.

wal_level = hot_standby

This parameter tells Postgres to archive the logs suitable for the replication, there are several 
options to log archiving in Postgres but "hot_standby" is the one to enable streaming replication. This is the same as enabling the binlogging on MySQL

wal_keep_segments = 32 # is the WAL segments to store

This is the interesting one - basically since the slave when using streaming replication reads from wal segments and manages to fall behind you need enough wal segment to be kept on the master 
for slave to catch up.

max_wal_senders = 2 #  is the max # of slaves that can connect

This is the maxinum database slaves that the master will support - if you need 4 slaves set this to 4 etc...A bit goofy parameter in MySQL realm this is not controlled i.e. if not need any more slaves
just dont configure ones.

archive_mode    = on

One more redundunt parameters on Postgres but what it does in conjunction with
wal_level = hot_standby is enabling the archive mode.

archive_command = 'rsync -a %p root@slave.host.com:/var/lib/postgresql/9.1/wals/%f

Now archive_command parameter is where the things get interesting and drastically deviate from MySQL concepts and the bulk of the confusion comes from.

In MySQL once you created a master database and subscribe a slave database to it, the rest is
performed automatically, the database will send/receive binlog data and create the relay log files
for you.

In Postgers realm things are a bit different, but no worries as you go further you will appreciate the flexibility the Postgres provides at the expense of complexity of course. In Postgres realm you 
not only enable the archive mode you also need to instruct Postgers database as to how and where to copy the archive logs to. Now the question you might have - Why do I need the archive logs if my slave database is going to read the wal segments?

You need the archive logs while you are copying the database from the master host to the slave
host, remember we selected to keep 32 wal segments, well those will be long gone by the time we copy say a 1TB database to the slave. 

Think of archive logs for Postgers replication as sort of both binlogs and relay logs in MySQL 
realm. Postgres does not generate the relay logs so the goal of the archive command to copy the 
archive logs to the slave database, essentially creating relay logs on the slave database, lets look at the archive command again:

archive_command = 'rsync -a %p postgres@slave.host.com:/var/lib/postgresql/9.1/wals/%f

What the archive command above does - it copies the archive logs to the slave database in "wals" 
directory, or simply put we are creating the "relay" logs on the slave database.  

Simple rule:
When setting up streaming replication the archive logs go to the slave database

Why? Because when Postgres slave will start catching up with the master it will need those to 
catch up and once its up to date it will use wal segments. By the same principle if you shut down 
the slave database it will use those archive logs to catch up. Otherwise if you archive on the master you will still need to transfer those logs to the slave.

Add the directory for the archive logs:


mkdir /var/lib/pgsql/9.1/main/wals

I use "wals" directory name - but you can call ir anything you like as long as its reflected in the 
parameters above.

Be sure to create those on master and a slave in case you need to reverse the roles.

And the last thing on the master to enable the replication .shh files

Remember the archive_command above that copies the archive files to the slave? We need to configure postgres user no to use the password when connecting to the slave to copy the files over. Using public/private key pairs makes it whole a lot easier. Be sure to setup public/private key pairs.



Restart the master database for the changes to take effect.

After the primary master restart you should see the archive files in /var/lib/pgsql/9.1/main/wals directory on the slave.

[root@smyhost# ls 000000010000086700000061.00000080.backup 000000010000087700000005 000000010000087700000008 00000001000008770000000B 000000010000086700000064.000013F0.backup 000000010000087700000006 000000010000087700000009 00000001000008770000000C 000000010000086E0000009E.00000020.backup 000000010000087700000007 00000001000008770000000A


Slave Database

Now since we have the master database working its time to configure the slave database. Slave database setup is very simple - all we have to do now is to copy the backup of primary database to the slave database and it will catch up using the archive files and then start reading the wal segment from the master. The archive logs serve as sort of relay logs in MySQL, if you stop the slave database it will fall behind but when you start it again it will use the archive logs to catch up.

Creating backup of primary database and copying to the slave 

This is where Postgres database shines - you do not have to take it offline to copy the data, it has a great new feature pg_start_backup - it will place master database in backup mode so you can copy it to the slave database keeping the data consistent. On the master database perform the following:

psql -c "SELECT pg_start_backup('backup', true)"

Copy the database to the standby slave using rsync or scp:

rsync-av --exclude postmaster.pid --exclude pg_xlog /var/lib/postgresql/9.1/main/root@slave.host.com:/var/lib/postgresql/9.1/main/

End backup mode on primary database

psql -c "SELECT pg_stop_backup()"

Setup the the slave database parameters:

You will need to edit two files: postgresql.conf and recivery.conf

Edit postgresql.conf and add the following lines:


wal_level = hot_standby
hot_standby = on


Create the recovery.conf and add the following lines:


standby_mode          = 'on'
primary_conninfo      = 'host=172.17.28.56  port=5432 user=postgres'

Connect string to the primary master database

trigger_file = '/var/lib/pgsql/9.1/main/pgsql.trigger'
restore_command = 'cp -f /var/lib/pgsql/9.1/main/wals/%f %p

This command is reading the archive logs files 

archive_cleanup_command = 'pg_archivecleanup /var/lib/pgsql/9.1/main/wals/ %r'

This one cleans up old archives

Now restart the the slave database for the changes to take affect and you have yourself a running replication.


Checking the replication:

ps -ef | grep receive  

You should see the following on the slave:

postgres 12050 11809  1 10:33 ?        00:02:30 postgres: wal receiver process   streaming 877/2A9F7300     

That means you database is reading the wal segments from the master. But if you dont see it right away - this isnormal since your database need to catchup with the master reading the archive logs first and then once it catches up you will see the receiver process.

Easy huh :-)