Archive for July 30, 2009

MySQL Cluster Data Node restart times

Restart times have been reduced in MySQL Cluster 6.3.28a & 7.0.9a – refer to that article for the new timings: http://www.clusterdb.com/mysql-cluster/mysql-cluster-restarts-get-faster/

Restarts are required for certain, infrequent maintenance activities. Note that there is no loss of service while a single node restarts.

When a data node restarts, it first attempts to load the data into memory from the local log files and then it will catch up with any subsequent changes by retrieveing them from the surviving node(s) in its node group.

 Based on this, you would expect the time taken to restart a data node to be influenced by:

  1. The amount of data that was stored on the data node before the restart
  2. Rate of updates being made to the data during the restart
  3. Network performance (assuming the data is being updated during recovery)

The times will also be influenced bycertain configuration parameters, performance of the host machine and whether the multi-threaded data node (ndbmtd) is being used.

To provide some insight into how these factors impact restart times, tests have been performed where the following factors are varied:

  • Database size (Each Gbyte is made up of 1,000,000 tuples in each of 5 tables)
  • Whether traffic is running or not (a single thread using the NDB API to send in up to 2K tps (10K updates/second))
  • Whether the 2 data nodes in the node group are on the same host or separated by a Gbit Ethernet network

The following factors are kept constant:

  • Physical hosts: Intel Core 2 Quad Q8200@2.33 GHz; 7.7 GBytes RAM
  • NoOfFragmentLogFiles: 300
  • MaxNoOfExecutionThreads=4

Here are the observed results:

Data Node restart times

Data Node restart times

There are a couple of things to note from these results:

  • Using the multi-threaded data node (ndbmtd) greatly improves the restart time (in this case, 4 threads were available, improvements could be even greater on an 8 core/thread system)
  • Results become less predictable when heavy update traffic is being processed (in this case, up to 10,000 updated rows/second on a single node group). In the tests, no attempt was made to regulate this traffic and the test application was run on the same host as the one of the data nodes. Changes to the rate of updates will vary how long it takes for the restarting node to catch-up as it’s a moving target.

There is another recovery/restart scenario. The measurements shown above assumed that the file system on the data node’s host was intact and could be used to recover the in-memory copy – if that were not the case (or the data nodes were restarted with the “initial” option) then all of the data would have to be recovered from the surviving data node(s) in the same node group. As a comparison restarting a 6 Gbyte data node with the “initial” option took 20 minutes compared to 8 minutes without it (ndbmtd, over Gbit n/w).





Doxygen output for MySQL Cluster NDB API & MGM API

NDB API Documentation

NDB API Documentation


A new page has been added to this site: NDB API Docs which presents the information from the header files for both the NDB API and the NDB Management API.

The material has been generated using doxygen and will be refreshed shortly after any new major, minor or maintenance release is made generally available (starting from MySQL Cluster 7.0.6).




New White Papers on deploying FreeRADIUS on MySQL Cluster

Scalable, Highly Available RADIUS solution using FreeRADIUS and MySQL Cluster

Scalable, Highly Available RADIUS solution using FreeRADIUS and MySQL Cluster

Before a client can start to access and consume services on a network, they must first be Authenticated to the network and then Authorized to use the services they are entitled too.  Their consumption of network resources then needs to be captured via the Accounting processes.  Collectively, Authentication, Authorization and Accounting (AAA) is a cornerstone of today’s network security, management and in many cases, monetization. RADIUS is a standard protocol to implement this AAA service.

As these networks grow, limitations can occur which add administrative overhead, inhibit flexible scaling and impact the timely synchronization of data across the AAA environment.

Using MySQL Cluster as the data store for a RADIUS server makes a lot of sense as it brings with it all of the standard MySQL Cluster benefits such as real-time response times, scalability, high availability and geographic redundancy.

Two new white papers have been produced to explain how MySQL Cluster can be used as the fault-tolerant, real-time, scalable data store for FreeRADIUS (a leading, open-source RADIUS server):

As these networks grow, limitations can occur which add administrative overhead, inhibit flexible scaling and impact the timely synchronization of data across the AAA environment.





Deploying MySQL Cluster over multiple hosts

This post builds upon the earlier article (Creating a simple Cluster on a single LINUX host) which explained how to install and run a Cluster where all of the nodes run on the same physical host.

The single host solution is not great for a real deployment – MySQL Cluster is designed to provide a High Availability (HA) solution by synchronously replicating data between data nodes – if all of the data nodes run on a single host that that machine is a single point of failure.

MySQL Cluster running accross 2 hosts

MySQL Cluster running accross 2 hosts

This article demonstrates how to split the nodes between hosts; the configuration will still be fairly simple, using just 2 machines but it should be obvious how to extend it to more.

This new Cluster will be split across 2 machines, with all functions duplicated on each machine as shown in the diagram.

Downloading and installing

In the following example, host “ws1” has the IP Address 192.168.0.3 and “ws2” has 192.168.0.4.

Browse to the MySQL Cluster LINUX download page at mysql.com and download the correct version (32 or 64 bit) and store it in the desired directory (in my case, /home/billy/mysql) on both hosts and then extract and rename the new folder to something easier to work with…

[billy@ws1 mysql]$ tar xvf mysql-cluster-gpl-7.0.6-linux-x86_64-glibc23.tar.gz
[billy@ws1 mysql]$ mv mysql-cluster-gpl-7.0.6-linux-x86_64-glibc23 7_0_6

[billy@ws2 mysql]$ tar xvf mysql-cluster-gpl-7.0.6-linux-x86_64-glibc23.tar.gz
[billy@ws2 mysql]$ mv mysql-cluster-gpl-7.0.6-linux-x86_64-glibc23 7_0_6

Setup the files that will be needed for the MySQL Server processes (mysqld) to run correctly on each host

[billy@ws1 mysql]$ cd 7_0_6
[billy@ws1 7_0_6]$ scripts/mysql_install_db --basedir=/home/billy/mysql/7_0_6 --datadir=/home/billy/mysql/7_0_6/data

[billy@ws2 mysql]$ cd 7_0_6
[billy@ws2 7_0_6]$ scripts/mysql_install_db --basedir=/home/billy/mysql/7_0_6 --datadir=/home/billy/mysql/7_0_6/data

Configure and run the Cluster

Create a sub-directory called “conf” on each host and create the following 3 files:

config.ini – note that this file is used by the management nodes and as we’re going to run an ndb_mgmd process on each host, this is created on both ws1 and ws2

[ndbd default]
noofreplicas=2

[ndbd]
hostname=192.168.0.3
id=1

[ndbd]
hostname=192.168.0.4
id=2

[ndb_mgmd]
id = 101
hostname=192.168.0.3

[ndb_mgmd]
id = 102
hostname=192.168.0.4

[mysqld]
id=51
hostname=192.168.0.3

[mysqld]
id=52
hostname=192.168.0.4

my.1.confto be used by the MySQL Server on ws1 and so store on ws1

[mysqld]
ndb-nodeid=51
ndbcluster
datadir=/home/billy/mysql/7_0_6/data
basedir=/home/billy/mysql/7_0_6
port=3306
server-id=51
log-bin

my.2.conf to be used by the MySQL Server on ws2 and so store on ws2

[mysqld]
ndb-nodeid=52
ndbcluster
datadir=/home/billy/mysql/7_0_6/data
basedir=/home/billy/mysql/7_0_6
port=3306
server-id=52
log-bin

Those files configure the nodes that make up the Cluster. From a command prompt window, launch the management nodes:

[billy@ws1 7_0_6]$ bin/ndb_mgmd --initial -f conf/config.ini --configdir=/home/billy/mysql/7_0_6/conf

[billy@ws2 7_0_6]$ bin/ndb_mgmd --initial -f conf/config.ini --configdir=/home/billy/mysql/7_0_6/conf

Check that the management nodes are up and running:

[billy@ws1 7_0_6]$ bin/ndb_mgm
ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]	2 node(s)
id=1 (not connected, accepting connect from 192.168.0.3)
id=2 (not connected, accepting connect from 192.168.0.4)

[ndb_mgmd(MGM)]	2 node(s)
id=101	@192.168.0.3  (mysql-5.1.34 ndb-7.0.6)
id=102 (not connected, accepting connect from 192.168.0.4)

[mysqld(API)]	2 node(s)
id=51 (not connected, accepting connect from 192.168.0.3)
id=52 (not connected, accepting connect from 192.168.0.4)

ndb_mgm> quit

[billy@ws1 7_0_6]$ bin/ndb_mgm -c 192.168.0.4:1186
ndb_mgm> show
Connected to Management Server at: 192.168.0.4:1186
Cluster Configuration
---------------------
[ndbd(NDB)]	2 node(s)
id=1 (not connected, accepting connect from 192.168.0.3)
id=2 (not connected, accepting connect from 192.168.0.4)

[ndb_mgmd(MGM)]	2 node(s)
id=101 (not connected, accepting connect from 192.168.0.3)
id=102	@192.168.0.4  (mysql-5.1.34 ndb-7.0.6)

[mysqld(API)]	2 node(s)
id=51 (not connected, accepting connect from 192.168.0.3)
id=52 (not connected, accepting connect from 192.168.0.4)
ndb_mgm> quit

and then start the 2 data nodes (ndbd) and 2 MySQL API/Server nodes (mysqld) and then check that they’re all up and running:

[billy@ws1 7_0_6]$ bin/ndbd --initial -c localhost:1186
2009-06-17 13:05:47 [ndbd] INFO     -- Configuration fetched from 'localhost:1186', generation: 1

[billy@ws2 7_0_6]$ bin/ndbd --initial -c localhost:1186
2009-06-17 13:05:51 [ndbd] INFO     -- Configuration fetched from 'localhost:1186', generation: 1

[billy@ws1 7_0_6]$ bin/mysqld --defaults-file=conf/my.1.conf&

[billy@ws2 7_0_6]$ bin/mysqld --defaults-file=conf/my.2.conf&

[billy@ws1 7_0_6]$ bin/ndb_mgm
-- NDB Cluster -- Management Client --
ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]	2 node(s)
id=1	@127.0.0.1  (mysql-5.1.34 ndb-7.0.6, Nodegroup: 0, Master)
id=2	@192.168.0.4  (mysql-5.1.34 ndb-7.0.6, Nodegroup: 0)

[ndb_mgmd(MGM)]	2 node(s)
id=101	@127.0.0.1  (mysql-5.1.34 ndb-7.0.6)
id=102	@127.0.0.1  (mysql-5.1.34 ndb-7.0.6)

[mysqld(API)]	2 node(s)
id=51	@192.168.0.3  (mysql-5.1.34 ndb-7.0.6)
id=52	@192.168.0.4  (mysql-5.1.34 ndb-7.0.6)
ndb_mgm> quit

Using the Cluster

There are now 2 API nodes/MySQL Servers/mysqlds running on the 2 different hosts; both accessing the same data. Each of those nodes can be accessed by the mysql client using the hostname and ports that were configured in the my.X.cnf files. For example, we can access the first of those nodes (node 51) in the following way (each API node is accessed using the host and port number in its associate my.X.cnf file:

[billy@ws1 7_0_6]$ bin/mysql -h localhost -P 3306
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 4
Server version: 5.1.34-ndb-7.0.6-cluster-gpl-log MySQL Cluster Server (GPL)
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql> use test;
Database changed
mysql> create table assets (name varchar(30) not null primary key,
-> value int) engine=ndb;
Query OK, 0 rows affected (0.99 sec)
mysql> insert into assets values ('Car','1900');
Query OK, 1 row affected (0.03 sec)
mysql> select * from assets;
+------+-------+
| name | value |
+------+-------+
| Car  |  1900 |
+------+-------+
1 row in set (0.00 sec)
mysql> quit
Bye

If you see “ERROR 1130 (HY000): Host ‘192.168.0.3’ is not allowed to connect to this MySQL server” then you need to set up the privileges (repeat on each host) – in a real system, you’d likely want tighter security that this:

[billy@ws1 7_0_6]$ bin/mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 3
Server version: 5.1.34-ndb-7.0.6-cluster-gpl-log MySQL Cluster Server (GPL)
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql> GRANT ALL ON *.* TO ''@'localhost';
Query OK, 0 rows affected (0.00 sec)
mysql> GRANT ALL ON *.* TO ''@'192.168.0.4';
Query OK, 0 rows affected (0.00 sec)
mysql> quit
Bye

[billy@ws2 7_0_6]$ bin/mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 8
Server version: 5.1.34-ndb-7.0.6-cluster-gpl-log MySQL Cluster Server (GPL)
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql> GRANT ALL ON *.* TO ''@'localhost';
Query OK, 0 rows affected (0.01 sec)
mysql> GRANT ALL ON *.* TO ''@'192.168.0.3';
Query OK, 0 rows affected (0.00 sec)
mysql> quit
Bye

Note that as this table is using the ndb (MySQL Cluster) storage engine, the data is actually held in the data nodes rather than in the SQL node and so we can access the exact same data from the the other SQL nodes:

[billy@ws1 7_0_6]$ bin/mysql -h 192.168.0.4 -P 3306
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 10
Server version: 5.1.34-ndb-7.0.6-cluster-gpl-log MySQL Cluster Server (GPL)
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select * from assets;
+------+-------+
| name | value |
+------+-------+
| Car  |  1900 |
+------+-------+
1 row in set (0.01 sec)
mysql> quit
Bye

Your next steps

This is still a fairly simple, contrived set up. Hopefully it’s clear how additional data or SQL nodes could be added and in a larger deployment you may well decide to run the management and SQL nodes on different hosts to the data nodes (in fact, when starting up the management nodes there is a warning message suggesting you deploy them elsewhere!).

To move the management node to a 3rd, independent physical host (and a 4th one if you want 2 management nodes for redundancy -a future article will explain when you might want to do that),  just change the IP address in the  [ndb_mgmd] section of config.ini and then run the ndb_mgmd executable on that new host. Note that the management node consumes very few resources and so can share that host with other functions/applications (e.g. SQL Server nodes but not data nodes).

You’d also set several more variables in the configuration files in order to size and tune your Cluster.

In this article, I used 2 LINUX hosts but you could extend the Windows example introduced in My first Cluster running on Windows in exactly the same way.





Intelligent user-controlled partitioning and writing distribution-aware NDB API Applications

Default partitioning

By default, Cluster will partition based on primary key

By default, Cluster will partition based on primary key

When adding rows to a table that’s using MySQL Cluster as the storage engine, each row is assigned to a partition where that partition is mastered by a particular data node in the Cluster. The best performance comes when all of the data required to satisfy a transaction is held within a single partition so that it can be satisfied within  a single data node rather than being bounced back and forth between multiple nodes where  extra latency will be introduced.

By default, Cluster partions the data by hashing the primary key. This is not always optimal.

For example, if we have 2 tables, the first using a single-column primary key (sub_id) and the second using a composite key (sub_id, service_name)…

mysql> describe names;
+--------+-------------+------+-----+---------+-------+
| Field  | Type        | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| sub_id | int(11)     | NO   | PRI | NULL    |       |
| name   | varchar(30) | YES  |     | NULL    |       |
+--------+-------------+------+-----+---------+-------+

mysql> describe services;
+--------------+-------------+------+-----+---------+-------+
| Field        | Type        | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+-------+
| sub_id       | int(11)     | NO   | PRI | 0       |       |
| service_name | varchar(30) | NO   | PRI |         |       |
| service_parm | int(11)     | YES  |     | NULL    |       |
+--------------+-------------+------+-----+---------+-------+

If we then add data to these (initially empty) tables, we can then use the ‘explain’ command to see which partitions (and hence phyical hosts) are used to store the data for this single subscriber…

mysql> insert into names values (1,'Billy');

mysql> insert into services values (1,'VoIP',20),(1,'Video',654),(1,'IM',878),(1,'ssh',666);

mysql> explain partitions select * from names where sub_id=1;
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | names | p3         | const | PRIMARY       | PRIMARY | 4       | const |    1 |       |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+-------+

mysql> explain partitions select * from services where sub_id=1;
+----+-------------+----------+-------------+------+---------------+---------+---------+-------+------+-------+
| id | select_type | table    | partitions  | type | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+----------+-------------+------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | services | p0,p1,p2,p3 | ref  | PRIMARY       | PRIMARY | 4       | const |   10 |       |
+----+-------------+----------+-------------+------+---------------+---------+---------+-------+------+-------+

The service records for the same subscriber (sub_id = 1) are split accross 4 diffent partitions (p0, p1, p2 & p3). This means that the query results in messages being passed backwards and forwards between the 4 different data nodes which cnsumes extra CPU time and incurs extra latency.

User-defined partitioning to the rescue

We can override the default behaviour by telling Cluster which fields should be fed into the hash algorithm. For our example, it’s reasonable to expect a transaction to access multiple records for the same subscriber (identified by their sub_id) and so the application will perform best if all of the rows for that sub_id are held in the same partition…

mysql> drop table services;

mysql> create table services (sub_id int, service_name varchar (30), service_parm int, primary key (sub_id, service_name)) engine = ndb
-> partition by key (sub_id);

mysql> insert into services values (1,'VoIP',20),(1,'Video',654),(1,'IM',878),(1,'ssh',666);

mysql> explain partitions select * from services where sub_id=1;
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+-------+
| id | select_type | table    | partitions | type | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | services | p3         | ref  | PRIMARY       | PRIMARY | 4       | const |   10 |       |
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+-------+

Now all of the rows for sub_id=1 from the services table are now held within a single partition (p3) which is the same as that holding the row for the same sub_id in the names table. Note that it wasn’t necessary to drop, recreate and re-provision the services table, the following command would have had the same effect:

mysql> alter table services partition by key (sub_id);

Writing a distribution-aware application using the NDB API

Distribution unaware NDB API application

Distribution unaware NDB API application

In our example, the data is nicely partitioned for optimum performance when accessing all of the subscriber’s data – a single data node holding all of their data. However, there is another step to take to get the best out of your NDB-API based application. By default, the NDB API will use the Transaction Coordinator (TC) on a ‘random’ data node to handle the transaction – we could get lucky and the guess is correct but it’s more likely that it will be sent to the wrong data node which with then have to proxy it to the correct data node. The probability of getting it right first time reduces as the number of node groups increases and so can prevent linear scaling.

It’s very simple to modify this behaviour so that the best data node/TC is hit first time, every time. When creating the transaction, the application can include parameters telling the NDB API one of the tables to be accessed and for what key(s). The NDB API will then use that information to identify the best TC to use…

const NdbDictionary::Dictionary* myDict= myNdb.getDictionary();
const NdbDictionary::Table *namesTable= myDict->getTable("names");
const NdbDictionary::Table *servicesTable= myDict->getTable("services");

NdbRecAttr *myRecAttr;

Ndb::Key_part_ptr dist_key[2];
dist_key[0].ptr = (const void*) &sub_id;
dist_key[0].len = sizeof(sub_id);
dist_key[1].ptr = NULL;
dist_key[1].len = NULL;

if (namesTable == NULL)
APIERROR(myDict->getNdbError());

if (servicesTable == NULL)
APIERROR(myDict->getNdbError());

NdbTransaction *myTransaction= myNdb.startTransaction(namesTable,
dist_key);
if (myTransaction == NULL) APIERROR(myNdb.getNdbError());

NdbOperation *myOperation= myTransaction->getNdbOperation(namesTable);
if (myOperation == NULL) APIERROR(myTransaction->getNdbError());

myOperation->readTuple(NdbOperation::LM_Read);
myOperation->equal("sub_id",sub_id);

myRecAttr= myOperation->getValue("name", NULL);
if (myRecAttr == NULL) APIERROR(myTransaction->getNdbError());

// Perform operations on "services" table as well as part of another operation
// if required; the subscriber's data will be in the same data node

if (myTransaction->execute( NdbTransaction::Commit ) == -1)
APIERROR(myTransaction->getNdbError());

printf(" %2d    %sn",
sub_id, myRecAttr->aRef());

myNdb.closeTransaction(myTransaction);

Note that as the services table has been configured to use the same field (sub_id) for partitioning as the names table, the startTransaction method only needs to know about the namesTable as the TC that the NDB API selects will serve just as well for this subscriber’s data from the services table. The rest of the code can be found in distaware.





MySQL Cluster Multi-Range Read using NDB API

As described in “Batching – improving MySQL Cluster performance when using the NDB API“, reducing the number of times the application node has to access the data nodes can greatly improve performance and reduce latency. That article focussed on setting up multiple operatations (as part of a single transaction) and then executing them as a single batch sent by the NDB API library to the data nodes.

The purpose of this entry is to show how a single NDB API operation can access multiple rows from a table with a single index lookup. It goes on to explain the signifficance of this both now and in the future (much faster joins using SQL for MySQL Cluster tables).

There are several operation types to cover table scans and index lookups (refer to the “MySQL Cluster API Developer Guide” for detals). For this example, I use an NdbIndexScanOperation.

The code sample assumes that the following table structure and data has been set up for table “COUNTRY” in database “TEST_DB_1” using the NDB storage engine (Note that the full example application sets this up automatically):

+--------------+------------------+------+-----+---------+-------+
| Field        | Type             | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+-------+
| SUB_ID       | int(10) unsigned | NO   | PRI | NULL    |       |
| COUNTRY_CODE | int(10) unsigned | NO   |     | NULL    |       |
+--------------+------------------+------+-----+---------+-------+

+--------+--------------+
| SUB_ID | COUNTRY_CODE |
+--------+--------------+
|     13 |            1 |
|      2 |            1 |
|      4 |           61 |
|      7 |           46 |
|      9 |           44 |
|     10 |           33 |
|     12 |           44 |
|      5 |           33 |
|     14 |           61 |
|      1 |           44 |
|      8 |            1 |
+--------+--------------+

The following code causes the NDB API library to send a single request from the application to the data nodes to read the rows where the primary key “SUB_ID” falls into the ranges (2<= SUB_ID <4); (5 < SUB_ID <=9) or (SUB_ID == 13). Note that this is just a fragment of the code and the error checking has been removed for clarity (refer to full example application to see the rest of the code, including the error handling).

NdbIndexScanOperation *psop;

/* RecAttrs for NdbRecAttr Api */
NdbRecAttr *recAttrAttr1;
NdbRecAttr *recAttrAttr2;

psop=myTransaction->getNdbIndexScanOperation(myPIndex);

Uint32 scanFlags=
  NdbScanOperation::SF_OrderBy |
  NdbScanOperation::SF_MultiRange |
  NdbScanOperation::SF_ReadRangeNo;

psop->readTuples(NdbOperation::LM_Read,
                 scanFlags,
                 (Uint32) 0,          // batch
                 (Uint32) 0)          // parallel

/* Add a bound
* Tuples where SUB_ID >=2 and < 4
*/
Uint32 low=2;
Uint32 high=4;
Uint32 match=13;

psop->setBound("SUB_ID", NdbIndexScanOperation::BoundLE, (char*)&low);
psop->setBound("SUB_ID", NdbIndexScanOperation::BoundGT, (char*)&high);
psop->end_of_bound(0);

/* Second bound
* Tuples where SUB_ID > 5 and <=9
*/
low=5;
high=9;
psop->setBound("SUB_ID", NdbIndexScanOperation::BoundLT, (char*)&low);
psop->setBound("SUB_ID", NdbIndexScanOperation::BoundGE, (char*)&high);
psop->end_of_bound(1);

/* Third bound
* Tuples where SUB_ID == 13
*/
psop->setBound("SUB_ID", NdbIndexScanOperation::BoundEQ, (char*)&match);
psop->end_of_bound(2);

/* Read all columns */
recAttrAttr1=psop->getValue("SUB_ID");
recAttrAttr2=psop->getValue("COUNTRY_CODE");

myTransaction->execute( NdbTransaction::Commit);

while (psop->nextResult(true) == 0)
{
  printf(" %8d    %8d   Range no : %2dn",
  recAttrAttr1->u_32_value(),
  recAttrAttr2->u_32_value(),
  psop->get_range_no());
}
psop->close();

When run, this code produces the following output:

SUB_ID    COUNTRY_CODE
2           1   Range no :  0
7          46   Range no :  1
8           1   Range no :  1
9          44   Range no :  1
13          1   Range no :  2

Why is this signifficant?

This can be very useful for applications using the NDB API; imagine an application that wanted to find the birthdays for all of my friends. Assume that I have 2 tables of interest:

mysql> describe friends; describe birthday;
+--------+-------------+------+-----+---------+-------+
| Field  | Type        | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| name   | varchar(30) | NO   | PRI | NULL    |       |
| friend | varchar(30) | NO   | PRI |         |       |
+--------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| name  | varchar(30) | NO   | PRI | NULL    |       |
| day   | int(11)     | YES  |     | NULL    |       |
| month | int(11)     | YES  |     | NULL    |       |
| year  | int(11)     | YES  |     | NULL    |       |
+-------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)

Using the NDB API, I can create 1 NdbIndexScanOperation operation to read all tuples from “friends” where the “name” field matches “Andrew” and then use the results to perform a second NdbIndexScanOperation to read the rows that match each of my friends’ names in the “birthday” table. In other words, performing a join using only 2 trips from the application to the data nodes.

Of course, it’s simple to get the same results using SQL…SELECT birthday.name, birthday.day, birthday.month FROM friends, birthday WHERE friend.name='Andrew' AND friends.friend=birthday.name;However, if the tables are very large and I have a lot of friends then performing this join using SQL can be expensive as it requires a separate trip to the data nodes to fetch each birthday. In the future, Batched Key Access (BKA) will optimise these joins by performing one of these NDB API Multi-Range Reads (MRR) to fetch all of the birthdays in one go!  Until then, using the NDB API directly can deliver signifficantly faster results.