Archive for andrew

Creating a MySQL plugin to produce an integer timestamp

This article shows how to create a MySQL-plugin that can be used to create a function which can in turn be used in stored procedures. The function will produce an integer value representing the time (to the nearest usec).

I’m working on an article for conflict detection/resolution when using MySQL Cluster asynchronous replication which requires an integer column to store a timestamp for comparison purposes. In fact, it doesn’t actually need the timestamp to represent an absolute or even a relative point in time – all it cares about is that the if the function is called twice on 2 different hosts that the 2nd call will always result in a larger number than the 1st. Obviously, in a production environment the times on the 2 hosts would need to be kept in sync.

The c code (inttime.c)

#include <mysql.h>
#include <sys/time.h>

my_bool inttime_init(UDF_INIT *initid,UDF_ARGS *args, char *message) {
  return 0;
}

void inttime_deinit(UDF_INIT *initid) {};

unsigned long int inttime(UDF_INIT *initid, UDF_ARGS *args, char *result,
                          unsigned long  *length, char *is_null, char *error)
{
  struct timeval tv;
  gettimeofday(&tv,(void *)0);
  return ((double)tv.tv_usec)+tv.tv_sec*1000000;
}

Compiling and deploying the shared library

[billy@ws1 timestamp]$ gcc -I/usr/local/mysql/include/ -fPIC -shared -o inttime.so inttime.c
[billy@ws1 timestamp]$ cp inttime.so /usr/local/mysql/lib/plugin/

Creating the function

mysql> create function inttime RETURNS REAL SONAME 'inttime.so';

Test the function

mysql> select inttime();select inttime();
+------------------+
| inttime()        |
+------------------+
| 1250080524270706 |
+------------------+
1 row in set (0.00 sec)

+------------------+
| inttime()        |
+------------------+
| 1250080524270833 |
+------------------+
1 row in set (0.00 sec)

Note that the results are different and the second result is always larger than the first – function is fit for purpose 🙂

Using the function from a stored procedure

mysql> CREATE TRIGGER updateTAB1 BEFORE UPDATE ON TAB1 FOR EACH ROW SET NEW.ts = inttime();

Acknowledgements

I used the c code found at http://lists.mysql.com/internals/33702 as the starting point.





Setting up MySQL Asynchronous Replication for High Availability

Asynchronous Replication for High Availability

Asynchronous Replication for High Availability

MySQL replication is often positioned as a solution to provide extra throughput for your database (especially when dealing with high numbers of read operations). What tends to be overlooked is how it can be used to provide high availability (HA) – no matter how much redundancy you have at a local level, your system remains at risk from a single catastrophic failure – loss of power, earthquake, terrorist attack etc. By using MySQL asynchronous replication to maintain a geographically remote copy of that data, service can be maintained through any single site failure.

As this replication is asynchronous, there are no low latency demands on the network and so the two sites can be thousands of miles apart while utilising low-cost connections.

This article provides a simple example of how to set up asynchronous replication between 2 MySQL databases in order to provide a Highly Available solution. First of all, it will be 2 databases where the tables will be stored in the MyISAM storage engine and then between 2 MySQL Cluster databases where I also configure it for Multi-master replication where changes can be made at either site.

Subsequent articles will build on this to show:

  • Collision detection and resolution when using MySQL Cluster multi-master asynchronous replication
  • Introducing asynchronous replication to a running MySQL Cluster database

Throughout this article, two machines are used: ws1 (192.168.0.3) and ws2 (192.168.0.4). ws1 will be set up as the master and ws2 as the slave (in the multi-master configuration, both act as both master and slave).

Setting up replication for non-Cluster databases

Replication is performed from one MySQL Server to another; the master makes the changes available and then one or more other Servers pick up those changes and apply them to their own databases. In this example, both databases will store the table data using the same storage engine (MyISAM) but it’s possible to mix and match (for example, take a look at  MySQL Cluster – flexibility of replication). As this is intended as a simple introduction to replication, I’m keeping life simple by assuming that this is all being set up before the database goes into production – if that isn’t the case for you and you need to cope with existing data then check out the MySQL documentation or subsequent articles on this site.

The my.cnf files can be set up as normal but the one for the MySQL Server that will act as the Master needs to have binary-logging enabled. Also, each of the server needs to have a unique server-id. Here are the my.cnf files used for this example:

my.cnf (Master)

[mysqld]
datadir=/home/billy/mysql/myisam/data
basedir=/usr/local/mysql
port=3306
server-id=1
log-bin

my.cnf (Slave)

[mysqld]
datadir=/home/billy/mysql/myisam/data
basedir=/usr/local/mysql
port=3306
server-id=2

Fire up the Master MySQL Server:

[billy@ws1 myisam]$ mysqld --defaults-file=my.cnf&

The slave needs a userid/password in order to access the master server – best practice is to create a dedicated user with just the required privileges:

[billy@ws1 myisam]$ mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 1
Server version: 5.1.34-ndb-7.0.6-cluster-gpl-log MySQL Cluster Server (GPL)

Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.

mysql> grant replication slave on *.* to 'repl'@'192.168.0.4'
 -> identified by 'repl';
Query OK, 0 rows affected (0.00 sec)

The slave can now be told to start processing the replication data that will be staged by the master server:

[billy@ws2 myisam]$ mysqld --defaults-file=my.cnf&
[billy@ws2 myisam]$ mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 2
Server version: 5.1.34-ndb-7.0.6-cluster-gpl MySQL Cluster Server (GPL)

Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql> CHANGE MASTER TO
-> MASTER_HOST='192.168.0.3',
-> MASTER_PORT=3306,
-> MASTER_USER='repl',
-> MASTER_PASSWORD='repl',
-> MASTER_LOG_FILE='',
-> MASTER_LOG_POS=4;
090803 15:48:09 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=ws2-relay-bin' to avoid this problem.
Query OK, 0 rows affected (0.01 sec)
mysql> start slave;
090803 15:51:24 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './ws2-relay-bin.000001' position: 4
Query OK, 0 rows affected (0.00 sec)
090803 15:51:24 [Note] Slave I/O thread: connected to master 'repl@192.168.0.3:3306',replication started in log 'FIRST' at position 4

Now to test that replication is up and running, create a table on the master, add some data and then check that the table and data can be read from the slave:

[billy@ws1 myisam]$ mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 3
Server version: 5.1.34-ndb-7.0.6-cluster-gpl-log MySQL Cluster Server (GPL)

Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.

mysql> use test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> create table numbers (num1 int, num2 int);
Query OK, 0 rows affected (0.00 sec)

mysql> insert into numbers values (1,10),(2,20),(3,30);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0
[billy@ws2 myisam]$ mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 5
Server version: 5.1.34-ndb-7.0.6-cluster-gpl MySQL Cluster Server (GPL)

Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.

mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select * from numbers;
+------+------+
| num1 | num2 |
+------+------+
|    1 |   10 |
|    2 |   20 |
|    3 |   30 |
+------+------+
3 rows in set (0.00 sec)

Multi-Master Replication with MySQL Cluster

Multi-Master Replication for HA with MySQL Cluster

Multi-Master Replication for HA with MySQL Cluster

There are a few asynchronous replication capabilities that are unique to MySQL Cluster – one of those is that changes are replicated even if they are made directly to the data nodes using the NDB API, another is that replication can be performed in both directions i.e. multi-master. One of the advantages of this is you can share both read and write operations across both sites so that no capacity is wasted – it also gives you the confidence that either site is sane and ready to take over for the other at any point. You should aim to minimise how often the same rows are modified at the same time on both sites – conflict detection/resolution has been implemented but it can only roll-back the rows changes that conflict with other row changes rather than the full transaction.

It’s important to note that in this article, I’m not talking about the synchronous replication that takes place between data nodes within a single MySQL Cluster site (that happens in parallel and is orthogonal to the asynchronous replication to a remote site).

When performing multi-master asynchronous replication between 2 Clusters, 1 (or more) MySQL Servers in each Cluster is nominated as a master and 1 or more as slaves (it can be the same server that takes on both roles and you can have multiple channels set up in case a channel or MySQL Server is lost). Changes made to the data at either site through any of their MySQL Servers (or directly to the data nodes using the NDB API) will be replicated to the other site.

I will focus on setting up the replication, you can refer to Deploying MySQL Cluster over multiple hosts for the steps to configure and run each MySQL Cluster site.

Most of the steps are very similar to those in the first example – the main differences would come when introducing asynchronous replication to  a MySQL Cluster instance that already contains data and is up and running (processing updates) which will be covered in a subsequent article.

Binary logging needs to be enabled on the MySQL Server(s) at each site that will act as a replication master:

my1.cnf (Master)

[mysqld]
ndb-nodeid=4
ndbcluster
datadir=/home/billy/mysql/replication/7_0_6/data
basedir=/home/billy/mysql/replication/7_0_6
port=3306
server-id=1
log-bin

my1.cnf (Slave)

[mysqld]
ndb-nodeid=4
ndbcluster
datadir=/home/billy/mysql/replication/7_0_6/data
basedir=/home/billy/mysql/replication/7_0_6
port=3306
server-id=2
log-bin

The MySQL Cluster nodes (including the MySQL Servers (mysqld) from both sites should be started up as normal.

In this case, the replication users should be set up for both Clusters:


[billy@ws1 7_0_6]$ mysqld --defaults-file=my.cnf&
[billy@ws1 7_0_6]$ mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or g.

Your MySQL connection id is 2

Server version: 5.1.34-ndb-7.0.6-cluster-gpl-log MySQL Cluster Server (GPL)
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql> grant replication slave on *.* to 'repl'@'192.168.0.4'
-> identified by 'repl';

Query OK, 0 rows affected (0.01 sec)

[billy@ws2 7_0_6]$ mysqld --defaults-file=my.cnf&
[billy@ws2 7_0_6]$ mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or g.

Your MySQL connection id is 3

Server version: 5.1.34-ndb-7.0.6-cluster-gpl-log MySQL Cluster Server (GPL)
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql> grant replication slave on *.* to 'repl'@'192.168.0.3'
-> identified by 'repl';

Query OK, 0 rows affected (0.00 sec)

Replication can then be setup and started on each of the MySQL Servers (those acting as slaves for each Cluster):

[billy@ws1 7_0_6]$ mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 3
Server version: 5.1.34-ndb-7.0.6-cluster-gpl-log MySQL Cluster Server (GPL)

Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.

mysql> CHANGE MASTER TO
 -> MASTER_HOST='192.168.0.4',
 -> MASTER_PORT=3306,
 -> MASTER_USER='repl',
 -> MASTER_PASSWORD='repl',
 -> MASTER_LOG_FILE='',
 -> MASTER_LOG_POS=4;
090803 17:25:00 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=ws1-relay-bin' to avoid this problem.
Query OK, 0 rows affected (0.01 sec)
mysql> start slave;
090803 17:27:20 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './ws1-relay-bin.000001' position: 4
Query OK, 0 rows affected (0.02 sec)
090803 17:27:20 [Note] Slave I/O thread: connected to master 'repl@192.168.0.4:3306',replication started in log 'FIRST' at position 4

[billy@ws2 7_0_6]$ mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 4
Server version: 5.1.34-ndb-7.0.6-cluster-gpl-log MySQL Cluster Server (GPL)

Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.

mysql> CHANGE MASTER TO
 -> MASTER_HOST='192.168.0.3',
 -> MASTER_PORT=3306,
 -> MASTER_USER='repl',
 -> MASTER_PASSWORD='repl',
 -> MASTER_LOG_FILE='',
 -> MASTER_LOG_POS=4;
090803 17:25:56 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=ws2-relay-bin' to avoid this problem.
Query OK, 0 rows affected (0.01 sec)
mysql> start slave;
090803 17:27:25 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './ws2-relay-bin.000001' position: 4
Query OK, 0 rows affected (0.00 sec)
090803 17:27:25 [Note] Slave I/O thread: connected to master 'repl@192.168.0.3:3306',replication started in log 'FIRST' at position 4

This time, to make sure that replication is working in both directions, I make changes to both Clusters and then check that they appear at the other site:

[billy@ws1 7_0_6]$ mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 8
Server version: 5.1.34-ndb-7.0.6-cluster-gpl-log MySQL Cluster Server (GPL)

Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.

mysql> use test;
Database changed
mysql> create table numbers (num1 int, num2 int) engine=ndb;
090803 17:32:10 [Note] NDB Binlog: CREATE TABLE Event: REPL$test/numbers
090803 17:32:10 [Note] NDB Binlog: logging ./test/numbers (UPDATED,USE_WRITE)
Query OK, 0 rows affected (0.62 sec)

mysql> insert into numbers values (1,10),(2,20),(3,30);
Query OK, 3 rows affected (0.04 sec)
Records: 3  Duplicates: 0  Warnings: 0
billy@ws2 7_0_6]$ mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 8
Server version: 5.1.34-ndb-7.0.6-cluster-gpl-log MySQL Cluster Server (GPL)

Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.

mysql> use test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select * from numbers;
+------+------+
| num1 | num2 |
+------+------+
|    2 |   20 |
|    3 |   30 |
|    1 |   10 |
+------+------+
3 rows in set (0.01 sec)

mysql> insert into numbers values (4,40),(5,50),(6,60);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0
[billy@ws1 7_0_6]$ mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 9
Server version: 5.1.34-ndb-7.0.6-cluster-gpl-log MySQL Cluster Server (GPL)

Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.

mysql> use test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select * from numbers;
+------+------+
| num1 | num2 |
+------+------+
|    1 |   10 |
|    6 |   60 |
|    3 |   30 |
|    5 |   50 |
|    2 |   20 |
|    4 |   40 |
+------+------+
6 rows in set (0.02 sec)




Webinar: MySQL Cluster 7.0: What’s New? (AS/ANZ/IN)

Mat Keep and I will be presenting a Webinar on Wednesday, August 12, 2009 describing the new capabilities of MySQL Cluster 7.0. The time is designed to be Asia/Australasia-friendly but of course it’s open to everyone.

Register for the MySQL Cluster 7 Webinar here.

By attending this webinar, you will learn more about the next generation of MySQL Cluster, and how it enables you to deliver 99.999% database availability, with real time performance and linear scalability, while slashing TCO.

Boasting a range of breakthrough capabilities, MySQL Cluster 7.0 will enable you to stay ahead of your most demanding, mission-critical application requirements. Enhanced scalability delivers higher database performance with fewer nodes to simplify deployment and administration. Expanded platform support and interoperability delivers more choice in both the development and deployment of MySQL Cluster. Simplified cluster monitoring tools reduce DBA administration overhead and operational costs.

Wednesday, August 12, 2009: 14:00 Singapore/Malaysia/Philippines time

Wed, Aug 12: 11:30 India, Sri Lanka
Wed, Aug 12: 13:00 Thailand/Vietnam/Western Indonesia time
Wed, Aug 12: 14:00 Western Australia
Wed, Aug 12: 15:30 Northern Territory (Australia)
Wed, Aug 12: 15:30 South Australia
Wed, Aug 12: 16:00 Queensland (Australia)
Wed, Aug 12: 16:00 NSW, ACT, Victoria, Tasmania (Australia)
Wed, Aug 12: 18:00 New Zealand time




MySQL Cluster Data Node restart times

Restart times have been reduced in MySQL Cluster 6.3.28a & 7.0.9a – refer to that article for the new timings: http://www.clusterdb.com/mysql-cluster/mysql-cluster-restarts-get-faster/

Restarts are required for certain, infrequent maintenance activities. Note that there is no loss of service while a single node restarts.

When a data node restarts, it first attempts to load the data into memory from the local log files and then it will catch up with any subsequent changes by retrieveing them from the surviving node(s) in its node group.

 Based on this, you would expect the time taken to restart a data node to be influenced by:

  1. The amount of data that was stored on the data node before the restart
  2. Rate of updates being made to the data during the restart
  3. Network performance (assuming the data is being updated during recovery)

The times will also be influenced bycertain configuration parameters, performance of the host machine and whether the multi-threaded data node (ndbmtd) is being used.

To provide some insight into how these factors impact restart times, tests have been performed where the following factors are varied:

  • Database size (Each Gbyte is made up of 1,000,000 tuples in each of 5 tables)
  • Whether traffic is running or not (a single thread using the NDB API to send in up to 2K tps (10K updates/second))
  • Whether the 2 data nodes in the node group are on the same host or separated by a Gbit Ethernet network

The following factors are kept constant:

  • Physical hosts: Intel Core 2 Quad Q8200@2.33 GHz; 7.7 GBytes RAM
  • NoOfFragmentLogFiles: 300
  • MaxNoOfExecutionThreads=4

Here are the observed results:

Data Node restart times

Data Node restart times

There are a couple of things to note from these results:

  • Using the multi-threaded data node (ndbmtd) greatly improves the restart time (in this case, 4 threads were available, improvements could be even greater on an 8 core/thread system)
  • Results become less predictable when heavy update traffic is being processed (in this case, up to 10,000 updated rows/second on a single node group). In the tests, no attempt was made to regulate this traffic and the test application was run on the same host as the one of the data nodes. Changes to the rate of updates will vary how long it takes for the restarting node to catch-up as it’s a moving target.

There is another recovery/restart scenario. The measurements shown above assumed that the file system on the data node’s host was intact and could be used to recover the in-memory copy – if that were not the case (or the data nodes were restarted with the “initial” option) then all of the data would have to be recovered from the surviving data node(s) in the same node group. As a comparison restarting a 6 Gbyte data node with the “initial” option took 20 minutes compared to 8 minutes without it (ndbmtd, over Gbit n/w).





Doxygen output for MySQL Cluster NDB API & MGM API

NDB API Documentation

NDB API Documentation


A new page has been added to this site: NDB API Docs which presents the information from the header files for both the NDB API and the NDB Management API.

The material has been generated using doxygen and will be refreshed shortly after any new major, minor or maintenance release is made generally available (starting from MySQL Cluster 7.0.6).





New White Papers on deploying FreeRADIUS on MySQL Cluster

Scalable, Highly Available RADIUS solution using FreeRADIUS and MySQL Cluster

Scalable, Highly Available RADIUS solution using FreeRADIUS and MySQL Cluster

Before a client can start to access and consume services on a network, they must first be Authenticated to the network and then Authorized to use the services they are entitled too.  Their consumption of network resources then needs to be captured via the Accounting processes.  Collectively, Authentication, Authorization and Accounting (AAA) is a cornerstone of today’s network security, management and in many cases, monetization. RADIUS is a standard protocol to implement this AAA service.

As these networks grow, limitations can occur which add administrative overhead, inhibit flexible scaling and impact the timely synchronization of data across the AAA environment.

Using MySQL Cluster as the data store for a RADIUS server makes a lot of sense as it brings with it all of the standard MySQL Cluster benefits such as real-time response times, scalability, high availability and geographic redundancy.

Two new white papers have been produced to explain how MySQL Cluster can be used as the fault-tolerant, real-time, scalable data store for FreeRADIUS (a leading, open-source RADIUS server):

As these networks grow, limitations can occur which add administrative overhead, inhibit flexible scaling and impact the timely synchronization of data across the AAA environment.





Deploying MySQL Cluster over multiple hosts

This post builds upon the earlier article (Creating a simple Cluster on a single LINUX host) which explained how to install and run a Cluster where all of the nodes run on the same physical host.

The single host solution is not great for a real deployment – MySQL Cluster is designed to provide a High Availability (HA) solution by synchronously replicating data between data nodes – if all of the data nodes run on a single host that that machine is a single point of failure.

MySQL Cluster running accross 2 hosts

MySQL Cluster running accross 2 hosts

This article demonstrates how to split the nodes between hosts; the configuration will still be fairly simple, using just 2 machines but it should be obvious how to extend it to more.

This new Cluster will be split across 2 machines, with all functions duplicated on each machine as shown in the diagram.

Downloading and installing

In the following example, host “ws1” has the IP Address 192.168.0.3 and “ws2” has 192.168.0.4.

Browse to the MySQL Cluster LINUX download page at mysql.com and download the correct version (32 or 64 bit) and store it in the desired directory (in my case, /home/billy/mysql) on both hosts and then extract and rename the new folder to something easier to work with…

[billy@ws1 mysql]$ tar xvf mysql-cluster-gpl-7.0.6-linux-x86_64-glibc23.tar.gz
[billy@ws1 mysql]$ mv mysql-cluster-gpl-7.0.6-linux-x86_64-glibc23 7_0_6

[billy@ws2 mysql]$ tar xvf mysql-cluster-gpl-7.0.6-linux-x86_64-glibc23.tar.gz
[billy@ws2 mysql]$ mv mysql-cluster-gpl-7.0.6-linux-x86_64-glibc23 7_0_6

Setup the files that will be needed for the MySQL Server processes (mysqld) to run correctly on each host

[billy@ws1 mysql]$ cd 7_0_6
[billy@ws1 7_0_6]$ scripts/mysql_install_db --basedir=/home/billy/mysql/7_0_6 --datadir=/home/billy/mysql/7_0_6/data

[billy@ws2 mysql]$ cd 7_0_6
[billy@ws2 7_0_6]$ scripts/mysql_install_db --basedir=/home/billy/mysql/7_0_6 --datadir=/home/billy/mysql/7_0_6/data

Configure and run the Cluster

Create a sub-directory called “conf” on each host and create the following 3 files:

config.ini – note that this file is used by the management nodes and as we’re going to run an ndb_mgmd process on each host, this is created on both ws1 and ws2

[ndbd default]
noofreplicas=2

[ndbd]
hostname=192.168.0.3
id=1

[ndbd]
hostname=192.168.0.4
id=2

[ndb_mgmd]
id = 101
hostname=192.168.0.3

[ndb_mgmd]
id = 102
hostname=192.168.0.4

[mysqld]
id=51
hostname=192.168.0.3

[mysqld]
id=52
hostname=192.168.0.4

my.1.confto be used by the MySQL Server on ws1 and so store on ws1

[mysqld]
ndb-nodeid=51
ndbcluster
datadir=/home/billy/mysql/7_0_6/data
basedir=/home/billy/mysql/7_0_6
port=3306
server-id=51
log-bin

my.2.conf to be used by the MySQL Server on ws2 and so store on ws2

[mysqld]
ndb-nodeid=52
ndbcluster
datadir=/home/billy/mysql/7_0_6/data
basedir=/home/billy/mysql/7_0_6
port=3306
server-id=52
log-bin

Those files configure the nodes that make up the Cluster. From a command prompt window, launch the management nodes:

[billy@ws1 7_0_6]$ bin/ndb_mgmd --initial -f conf/config.ini --configdir=/home/billy/mysql/7_0_6/conf

[billy@ws2 7_0_6]$ bin/ndb_mgmd --initial -f conf/config.ini --configdir=/home/billy/mysql/7_0_6/conf

Check that the management nodes are up and running:

[billy@ws1 7_0_6]$ bin/ndb_mgm
ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]	2 node(s)
id=1 (not connected, accepting connect from 192.168.0.3)
id=2 (not connected, accepting connect from 192.168.0.4)

[ndb_mgmd(MGM)]	2 node(s)
id=101	@192.168.0.3  (mysql-5.1.34 ndb-7.0.6)
id=102 (not connected, accepting connect from 192.168.0.4)

[mysqld(API)]	2 node(s)
id=51 (not connected, accepting connect from 192.168.0.3)
id=52 (not connected, accepting connect from 192.168.0.4)

ndb_mgm> quit

[billy@ws1 7_0_6]$ bin/ndb_mgm -c 192.168.0.4:1186
ndb_mgm> show
Connected to Management Server at: 192.168.0.4:1186
Cluster Configuration
---------------------
[ndbd(NDB)]	2 node(s)
id=1 (not connected, accepting connect from 192.168.0.3)
id=2 (not connected, accepting connect from 192.168.0.4)

[ndb_mgmd(MGM)]	2 node(s)
id=101 (not connected, accepting connect from 192.168.0.3)
id=102	@192.168.0.4  (mysql-5.1.34 ndb-7.0.6)

[mysqld(API)]	2 node(s)
id=51 (not connected, accepting connect from 192.168.0.3)
id=52 (not connected, accepting connect from 192.168.0.4)
ndb_mgm> quit

and then start the 2 data nodes (ndbd) and 2 MySQL API/Server nodes (mysqld) and then check that they’re all up and running:

[billy@ws1 7_0_6]$ bin/ndbd --initial -c localhost:1186
2009-06-17 13:05:47 [ndbd] INFO     -- Configuration fetched from 'localhost:1186', generation: 1

[billy@ws2 7_0_6]$ bin/ndbd --initial -c localhost:1186
2009-06-17 13:05:51 [ndbd] INFO     -- Configuration fetched from 'localhost:1186', generation: 1

[billy@ws1 7_0_6]$ bin/mysqld --defaults-file=conf/my.1.conf&

[billy@ws2 7_0_6]$ bin/mysqld --defaults-file=conf/my.2.conf&

[billy@ws1 7_0_6]$ bin/ndb_mgm
-- NDB Cluster -- Management Client --
ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]	2 node(s)
id=1	@127.0.0.1  (mysql-5.1.34 ndb-7.0.6, Nodegroup: 0, Master)
id=2	@192.168.0.4  (mysql-5.1.34 ndb-7.0.6, Nodegroup: 0)

[ndb_mgmd(MGM)]	2 node(s)
id=101	@127.0.0.1  (mysql-5.1.34 ndb-7.0.6)
id=102	@127.0.0.1  (mysql-5.1.34 ndb-7.0.6)

[mysqld(API)]	2 node(s)
id=51	@192.168.0.3  (mysql-5.1.34 ndb-7.0.6)
id=52	@192.168.0.4  (mysql-5.1.34 ndb-7.0.6)
ndb_mgm> quit

Using the Cluster

There are now 2 API nodes/MySQL Servers/mysqlds running on the 2 different hosts; both accessing the same data. Each of those nodes can be accessed by the mysql client using the hostname and ports that were configured in the my.X.cnf files. For example, we can access the first of those nodes (node 51) in the following way (each API node is accessed using the host and port number in its associate my.X.cnf file:

[billy@ws1 7_0_6]$ bin/mysql -h localhost -P 3306
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 4
Server version: 5.1.34-ndb-7.0.6-cluster-gpl-log MySQL Cluster Server (GPL)
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql> use test;
Database changed
mysql> create table assets (name varchar(30) not null primary key,
-> value int) engine=ndb;
Query OK, 0 rows affected (0.99 sec)
mysql> insert into assets values ('Car','1900');
Query OK, 1 row affected (0.03 sec)
mysql> select * from assets;
+------+-------+
| name | value |
+------+-------+
| Car  |  1900 |
+------+-------+
1 row in set (0.00 sec)
mysql> quit
Bye

If you see “ERROR 1130 (HY000): Host ‘192.168.0.3’ is not allowed to connect to this MySQL server” then you need to set up the privileges (repeat on each host) – in a real system, you’d likely want tighter security that this:

[billy@ws1 7_0_6]$ bin/mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 3
Server version: 5.1.34-ndb-7.0.6-cluster-gpl-log MySQL Cluster Server (GPL)
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql> GRANT ALL ON *.* TO ''@'localhost';
Query OK, 0 rows affected (0.00 sec)
mysql> GRANT ALL ON *.* TO ''@'192.168.0.4';
Query OK, 0 rows affected (0.00 sec)
mysql> quit
Bye

[billy@ws2 7_0_6]$ bin/mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 8
Server version: 5.1.34-ndb-7.0.6-cluster-gpl-log MySQL Cluster Server (GPL)
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql> GRANT ALL ON *.* TO ''@'localhost';
Query OK, 0 rows affected (0.01 sec)
mysql> GRANT ALL ON *.* TO ''@'192.168.0.3';
Query OK, 0 rows affected (0.00 sec)
mysql> quit
Bye

Note that as this table is using the ndb (MySQL Cluster) storage engine, the data is actually held in the data nodes rather than in the SQL node and so we can access the exact same data from the the other SQL nodes:

[billy@ws1 7_0_6]$ bin/mysql -h 192.168.0.4 -P 3306
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 10
Server version: 5.1.34-ndb-7.0.6-cluster-gpl-log MySQL Cluster Server (GPL)
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select * from assets;
+------+-------+
| name | value |
+------+-------+
| Car  |  1900 |
+------+-------+
1 row in set (0.01 sec)
mysql> quit
Bye

Your next steps

This is still a fairly simple, contrived set up. Hopefully it’s clear how additional data or SQL nodes could be added and in a larger deployment you may well decide to run the management and SQL nodes on different hosts to the data nodes (in fact, when starting up the management nodes there is a warning message suggesting you deploy them elsewhere!).

To move the management node to a 3rd, independent physical host (and a 4th one if you want 2 management nodes for redundancy -a future article will explain when you might want to do that),  just change the IP address in the  [ndb_mgmd] section of config.ini and then run the ndb_mgmd executable on that new host. Note that the management node consumes very few resources and so can share that host with other functions/applications (e.g. SQL Server nodes but not data nodes).

You’d also set several more variables in the configuration files in order to size and tune your Cluster.

In this article, I used 2 LINUX hosts but you could extend the Windows example introduced in My first Cluster running on Windows in exactly the same way.





Intelligent user-controlled partitioning and writing distribution-aware NDB API Applications

Default partitioning

By default, Cluster will partition based on primary key

By default, Cluster will partition based on primary key

When adding rows to a table that’s using MySQL Cluster as the storage engine, each row is assigned to a partition where that partition is mastered by a particular data node in the Cluster. The best performance comes when all of the data required to satisfy a transaction is held within a single partition so that it can be satisfied within  a single data node rather than being bounced back and forth between multiple nodes where  extra latency will be introduced.

By default, Cluster partions the data by hashing the primary key. This is not always optimal.

For example, if we have 2 tables, the first using a single-column primary key (sub_id) and the second using a composite key (sub_id, service_name)…

mysql> describe names;
+--------+-------------+------+-----+---------+-------+
| Field  | Type        | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| sub_id | int(11)     | NO   | PRI | NULL    |       |
| name   | varchar(30) | YES  |     | NULL    |       |
+--------+-------------+------+-----+---------+-------+

mysql> describe services;
+--------------+-------------+------+-----+---------+-------+
| Field        | Type        | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+-------+
| sub_id       | int(11)     | NO   | PRI | 0       |       |
| service_name | varchar(30) | NO   | PRI |         |       |
| service_parm | int(11)     | YES  |     | NULL    |       |
+--------------+-------------+------+-----+---------+-------+

If we then add data to these (initially empty) tables, we can then use the ‘explain’ command to see which partitions (and hence phyical hosts) are used to store the data for this single subscriber…

mysql> insert into names values (1,'Billy');

mysql> insert into services values (1,'VoIP',20),(1,'Video',654),(1,'IM',878),(1,'ssh',666);

mysql> explain partitions select * from names where sub_id=1;
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | names | p3         | const | PRIMARY       | PRIMARY | 4       | const |    1 |       |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+-------+

mysql> explain partitions select * from services where sub_id=1;
+----+-------------+----------+-------------+------+---------------+---------+---------+-------+------+-------+
| id | select_type | table    | partitions  | type | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+----------+-------------+------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | services | p0,p1,p2,p3 | ref  | PRIMARY       | PRIMARY | 4       | const |   10 |       |
+----+-------------+----------+-------------+------+---------------+---------+---------+-------+------+-------+

The service records for the same subscriber (sub_id = 1) are split accross 4 diffent partitions (p0, p1, p2 & p3). This means that the query results in messages being passed backwards and forwards between the 4 different data nodes which cnsumes extra CPU time and incurs extra latency.

User-defined partitioning to the rescue

We can override the default behaviour by telling Cluster which fields should be fed into the hash algorithm. For our example, it’s reasonable to expect a transaction to access multiple records for the same subscriber (identified by their sub_id) and so the application will perform best if all of the rows for that sub_id are held in the same partition…

mysql> drop table services;

mysql> create table services (sub_id int, service_name varchar (30), service_parm int, primary key (sub_id, service_name)) engine = ndb
-> partition by key (sub_id);

mysql> insert into services values (1,'VoIP',20),(1,'Video',654),(1,'IM',878),(1,'ssh',666);

mysql> explain partitions select * from services where sub_id=1;
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+-------+
| id | select_type | table    | partitions | type | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | services | p3         | ref  | PRIMARY       | PRIMARY | 4       | const |   10 |       |
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+-------+

Now all of the rows for sub_id=1 from the services table are now held within a single partition (p3) which is the same as that holding the row for the same sub_id in the names table. Note that it wasn’t necessary to drop, recreate and re-provision the services table, the following command would have had the same effect:

mysql> alter table services partition by key (sub_id);

Writing a distribution-aware application using the NDB API

Distribution unaware NDB API application

Distribution unaware NDB API application

In our example, the data is nicely partitioned for optimum performance when accessing all of the subscriber’s data – a single data node holding all of their data. However, there is another step to take to get the best out of your NDB-API based application. By default, the NDB API will use the Transaction Coordinator (TC) on a ‘random’ data node to handle the transaction – we could get lucky and the guess is correct but it’s more likely that it will be sent to the wrong data node which with then have to proxy it to the correct data node. The probability of getting it right first time reduces as the number of node groups increases and so can prevent linear scaling.

It’s very simple to modify this behaviour so that the best data node/TC is hit first time, every time. When creating the transaction, the application can include parameters telling the NDB API one of the tables to be accessed and for what key(s). The NDB API will then use that information to identify the best TC to use…

const NdbDictionary::Dictionary* myDict= myNdb.getDictionary();
const NdbDictionary::Table *namesTable= myDict->getTable("names");
const NdbDictionary::Table *servicesTable= myDict->getTable("services");

NdbRecAttr *myRecAttr;

Ndb::Key_part_ptr dist_key[2];
dist_key[0].ptr = (const void*) &sub_id;
dist_key[0].len = sizeof(sub_id);
dist_key[1].ptr = NULL;
dist_key[1].len = NULL;

if (namesTable == NULL)
APIERROR(myDict->getNdbError());

if (servicesTable == NULL)
APIERROR(myDict->getNdbError());

NdbTransaction *myTransaction= myNdb.startTransaction(namesTable,
dist_key);
if (myTransaction == NULL) APIERROR(myNdb.getNdbError());

NdbOperation *myOperation= myTransaction->getNdbOperation(namesTable);
if (myOperation == NULL) APIERROR(myTransaction->getNdbError());

myOperation->readTuple(NdbOperation::LM_Read);
myOperation->equal("sub_id",sub_id);

myRecAttr= myOperation->getValue("name", NULL);
if (myRecAttr == NULL) APIERROR(myTransaction->getNdbError());

// Perform operations on "services" table as well as part of another operation
// if required; the subscriber's data will be in the same data node

if (myTransaction->execute( NdbTransaction::Commit ) == -1)
APIERROR(myTransaction->getNdbError());

printf(" %2d    %sn",
sub_id, myRecAttr->aRef());

myNdb.closeTransaction(myTransaction);

Note that as the services table has been configured to use the same field (sub_id) for partitioning as the names table, the startTransaction method only needs to know about the namesTable as the TC that the NDB API selects will serve just as well for this subscriber’s data from the services table. The rest of the code can be found in distaware.





MySQL Cluster Multi-Range Read using NDB API

As described in “Batching – improving MySQL Cluster performance when using the NDB API“, reducing the number of times the application node has to access the data nodes can greatly improve performance and reduce latency. That article focussed on setting up multiple operatations (as part of a single transaction) and then executing them as a single batch sent by the NDB API library to the data nodes.

The purpose of this entry is to show how a single NDB API operation can access multiple rows from a table with a single index lookup. It goes on to explain the signifficance of this both now and in the future (much faster joins using SQL for MySQL Cluster tables).

There are several operation types to cover table scans and index lookups (refer to the “MySQL Cluster API Developer Guide” for detals). For this example, I use an NdbIndexScanOperation.

The code sample assumes that the following table structure and data has been set up for table “COUNTRY” in database “TEST_DB_1” using the NDB storage engine (Note that the full example application sets this up automatically):

+--------------+------------------+------+-----+---------+-------+
| Field        | Type             | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+-------+
| SUB_ID       | int(10) unsigned | NO   | PRI | NULL    |       |
| COUNTRY_CODE | int(10) unsigned | NO   |     | NULL    |       |
+--------------+------------------+------+-----+---------+-------+

+--------+--------------+
| SUB_ID | COUNTRY_CODE |
+--------+--------------+
|     13 |            1 |
|      2 |            1 |
|      4 |           61 |
|      7 |           46 |
|      9 |           44 |
|     10 |           33 |
|     12 |           44 |
|      5 |           33 |
|     14 |           61 |
|      1 |           44 |
|      8 |            1 |
+--------+--------------+

The following code causes the NDB API library to send a single request from the application to the data nodes to read the rows where the primary key “SUB_ID” falls into the ranges (2<= SUB_ID <4); (5 < SUB_ID <=9) or (SUB_ID == 13). Note that this is just a fragment of the code and the error checking has been removed for clarity (refer to full example application to see the rest of the code, including the error handling).

NdbIndexScanOperation *psop;

/* RecAttrs for NdbRecAttr Api */
NdbRecAttr *recAttrAttr1;
NdbRecAttr *recAttrAttr2;

psop=myTransaction->getNdbIndexScanOperation(myPIndex);

Uint32 scanFlags=
  NdbScanOperation::SF_OrderBy |
  NdbScanOperation::SF_MultiRange |
  NdbScanOperation::SF_ReadRangeNo;

psop->readTuples(NdbOperation::LM_Read,
                 scanFlags,
                 (Uint32) 0,          // batch
                 (Uint32) 0)          // parallel

/* Add a bound
* Tuples where SUB_ID >=2 and < 4
*/
Uint32 low=2;
Uint32 high=4;
Uint32 match=13;

psop->setBound("SUB_ID", NdbIndexScanOperation::BoundLE, (char*)&low);
psop->setBound("SUB_ID", NdbIndexScanOperation::BoundGT, (char*)&high);
psop->end_of_bound(0);

/* Second bound
* Tuples where SUB_ID > 5 and <=9
*/
low=5;
high=9;
psop->setBound("SUB_ID", NdbIndexScanOperation::BoundLT, (char*)&low);
psop->setBound("SUB_ID", NdbIndexScanOperation::BoundGE, (char*)&high);
psop->end_of_bound(1);

/* Third bound
* Tuples where SUB_ID == 13
*/
psop->setBound("SUB_ID", NdbIndexScanOperation::BoundEQ, (char*)&match);
psop->end_of_bound(2);

/* Read all columns */
recAttrAttr1=psop->getValue("SUB_ID");
recAttrAttr2=psop->getValue("COUNTRY_CODE");

myTransaction->execute( NdbTransaction::Commit);

while (psop->nextResult(true) == 0)
{
  printf(" %8d    %8d   Range no : %2dn",
  recAttrAttr1->u_32_value(),
  recAttrAttr2->u_32_value(),
  psop->get_range_no());
}
psop->close();

When run, this code produces the following output:

SUB_ID    COUNTRY_CODE
2           1   Range no :  0
7          46   Range no :  1
8           1   Range no :  1
9          44   Range no :  1
13          1   Range no :  2

Why is this signifficant?

This can be very useful for applications using the NDB API; imagine an application that wanted to find the birthdays for all of my friends. Assume that I have 2 tables of interest:

mysql> describe friends; describe birthday;
+--------+-------------+------+-----+---------+-------+
| Field  | Type        | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| name   | varchar(30) | NO   | PRI | NULL    |       |
| friend | varchar(30) | NO   | PRI |         |       |
+--------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| name  | varchar(30) | NO   | PRI | NULL    |       |
| day   | int(11)     | YES  |     | NULL    |       |
| month | int(11)     | YES  |     | NULL    |       |
| year  | int(11)     | YES  |     | NULL    |       |
+-------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)

Using the NDB API, I can create 1 NdbIndexScanOperation operation to read all tuples from “friends” where the “name” field matches “Andrew” and then use the results to perform a second NdbIndexScanOperation to read the rows that match each of my friends’ names in the “birthday” table. In other words, performing a join using only 2 trips from the application to the data nodes.

Of course, it’s simple to get the same results using SQL…SELECT birthday.name, birthday.day, birthday.month FROM friends, birthday WHERE friend.name='Andrew' AND friends.friend=birthday.name;However, if the tables are very large and I have a lot of friends then performing this join using SQL can be expensive as it requires a separate trip to the data nodes to fetch each birthday. In the future, Batched Key Access (BKA) will optimise these joins by performing one of these NDB API Multi-Range Reads (MRR) to fetch all of the birthdays in one go!  Until then, using the NDB API directly can deliver signifficantly faster results.





Batching – improving MySQL Cluster performance when using the NDB API

As many people are aware, the best performance can be achieved from MySQL Cluster by using the native (C++) NDB API (rather than using SQL via a MySQL Server). What’s less well known is that you can improve the performance of your NDB-API enabled application even further by ‘batching’. This article attempts to explain why batching helps and how to do it.

What is batching and why does it help?

NDB API accessing data from the Cluster without batching

NDB API accessing data from the Cluster without batching

Batching involves sending multiple operations from the application to the Cluster in one group rather than individually; the Cluster then processes these operations and sends back the results. Without batching, each of these operations incurs the latency of crossing the network as well as consuming CPU time on both the application and data node hosts.

By batching together multiple operations, all of the requests can be sent in one message and all of the replies received in another – thus reducing the number of messages and hence the latency and CPU time consumed.

How to use batching with the MySQL Cluster NDB API

Batched NDB API Operations

Batched NDB API Operations

The principle is that you batch together as many operations as you can, execute them together and then interpret the results. After interpretting the results, the application may then decide to send in another batch of operations.

An NDB API transaction consists of one or more operations where each operation (currently) acts on a single table and could be a simple primary key read or write or a complex table scan.

The operation is not sent to the Cluster at the point that it’s defined. Instead, the application must explicitly request that all operations defined within the transaction up to that point be executed – at which point, the NDB API can send the batch of operations to the data nodes to be processed. The application may request that the transaction be committed at that point or it may ask for the transaction to be held open so that it can analyse the results from the first set of operations and then use that information within a subsequent series of operations and then commit the transaction after executing that second batch of operations.

The following code sample shows how this can be implemented in practice (note that the application logic and all error handling has been ommited).

const NdbDictionary::Dictionary* myDict= myNdb.getDictionary();

const NdbDictionary::Table *myTable= myDict->getTable("tbl1");
const NdbDictionary::Table *myTable2= myDict->getTable("tbl2");

NdbTransaction *myTransaction= myNdb.startTransaction();

// Read all of the required data as part of a single batch

NdbOperation *myOperation= myTransaction->getNdbOperation(myTable1);
myOperation->readTuple(NdbOperation::LM_Read);
myOperation->equal("ref", asset_num);
myRecAttr= myOperation->getValue("cost", NULL);

NdbOperation *myOperation2= myTransaction->getNdbOperation(myTable2);
myOperation2->readTuple(NdbOperation::LM_Read);
myOperation2->equal("ref", asset_num);
myRecAttr= myOperation->getValue("volume", NULL);

myTransaction->execute(NdbTransaction::NoCommit);

// NOT SHOWN: Application logic interprets results from first set of operations

// Based on the data read during the initial batch, make the necessary changes

myOperation *myOperation3= myTransaction->getNdbOperation(myTable1);
myOperation3->updateTuple();
myOperation3->equal("ref", asset_num);
myOperation2->setValue("cost", new_cost);

myOperation *myOperation4= myTransaction->getNdbOperation(myTable2);
myOperation4->updateTuple();
myOperation4->equal("ref", asset_num);
myOperation4->setValue("volume", new_volume);

myTransaction->execute( NdbTransaction::Commit);
myNdb.closeTransaction(myTransaction);