Tag Archive for MySQL Cluster

MySQL Cluster 6.3.33 binaries released

The binary version for MySQL Cluster 6.3.33 has now been made available at http://www.mysql.com/downloads/cluster/6.3.html#downloads

A description of all of the changes (fixes) that have gone into MySQL Cluster 6.3.33 (compared to 6.3.32) can be found in the MySQL Cluster 6.3.33 ChangeLog .





Trying out MySQL Push-Down-Join (SPJ) preview

At the 2010 MySQL User Conference, Jonas Oreland presented on the work he’s been doing on improving the performance of joins when using MySQL Cluster – the slides are available for download. While not ready for production systems, a preview version is available for you to try out. The purpose of this blog is to step through  testing an example query as well as presenting the results (SPOILER: In one configuration, I got a 50x speedup!).

SPJ is by no means complete and there are a number of constraints as to which queries benefit (and I’ll give an example of one that didn’t). For details of the current (April 2010) software and limitations, check out Jonas’s slides and then keep up to date by following his blog.

We’re anxious to get feedback – please feel free to post results as comments to this blog but also make sure that you send them to spj-feedback@sun.com – describing your schema, the query or queries you tested, the output from EXPLAIN and your before and after timings.

Joins in MySQL Cluster are implemented as nested-loop joins within the MySQL Server; this can be inefficient as it results in many trips to the data nodes to fetch the required data. SPJ works by pushing the join (actually a spec of the needed data) down into the data nodes where the data can be collected and sent back up to the MySQL Server much more efficiently.

For my tests, I used 2 different configurations. In both cases there are 2 data nodes running on 2 physical hosts. In the first configuration the MySQL Server resides on one of those 2 hosts. In the second configuration, the MySQL Server is moved to a virtual machine running on a 3rd host.

Setting up the Cluster

On each of the 3 hosts, I downloaded the software from ftp://ftp.mysql.com/pub/mysql/download/cluster_telco/mysql-5.1.44-ndb-7.1.3-spj-preview/ and then compiled and installed it. If you’re not comfortable with that then you can find instructions in this earlier blog or if you’re used to using the tools from severalnines then check out the SPJ instructions on Johan’s blog.

Create the schema

The 3 tables I used can be created with these commands from the mysql client:

mysql> create database clusterdb; use clusterdb;
mysql> create table subs (sub_id int not null primary key,
dept int,country int) engine=ndb;
mysql> create table department (id int not null primary key,
name int) engine=ndb;
mysql> create table roles (dept int not null primary key,
role varchar (30)) engine=ndb;

Each of these tables is then populated with 100,000 rows (the files can be downloaded from here).

Once extracted, the data should be loaded into the database:

mysql> use clusterdb;
mysql> load data local infile "/home/billy/Dropbox/LINUX/projects/SPJ/subs.csv"
replace into table subs fields terminated by ',';
mysql> load data local infile  "/home/billy/Dropbox/LINUX/projects/SPJ/dept.csv"
 replace into table department fields terminated by ',';
mysql> load data local infile  "/home/billy/Dropbox/LINUX/projects/SPJ/roles.csv"
 replace into table roles fields terminated by ',';

Running the tests (Config 1 – local mysqld)

To get a baseline, ensure that SPJ is turned off:

mysql> set ndb_join_pushdown=off;

and then get the output from EXPLAIN:

mysql> EXPLAIN SELECT count(*) FROM subs, department, roles WHERE subs.country=44 AND department.id=subs.dept AND roles.dept=department.name;
+----+-------------+------------+--------+---------------+---------+---------+---------------------------+--------+-----------------------------------+
| id | select_type | table      | type   | possible_keys | key     | key_len | ref                       | rows   | Extra                             |
+----+-------------+------------+--------+---------------+---------+---------+---------------------------+--------+-----------------------------------+
|  1 | SIMPLE      | subs       | ALL    | NULL          | NULL    | NULL    | NULL                      | 100000 | Using where with pushed condition |
|  1 | SIMPLE      | department | eq_ref | PRIMARY       | PRIMARY | 4       | clusterdb.subs.dept       |      1 |                                   |
|  1 | SIMPLE      | roles      | eq_ref | PRIMARY       | PRIMARY | 4       | clusterdb.department.name |      1 |                                   |
+----+-------------+------------+--------+---------------+---------+---------+---------------------------+--------+-----------------------------------+

and then execute the query:

mysql> SELECT count(*) FROM subs, department, roles WHERE subs.country=44 AND department.id=subs.dept AND roles.dept=department.name;
+----------+
| count(*) |
+----------+
|    33334 |
+----------+
1 row in set (9.08 sec)

Now to see the benefits of SPJ, turn it on:

mysql> set ndb_join_pushdown=on;

Check the output from EXPLAIN again:

mysql> EXPLAIN SELECT count(*) FROM subs, department, roles WHERE subs.country=44 AND department.id=subs.dept AND roles.dept=department.name;
+----+-------------+------------+--------+---------------+---------+---------+---------------------------+--------+--------------------------------------------------------------+
| id | select_type | table      | type   | possible_keys | key     | key_len | ref                       | rows   | Extra                                                        |
+----+-------------+------------+--------+---------------+---------+---------+---------------------------+--------+--------------------------------------------------------------+
|  1 | SIMPLE      | subs       | ALL    | NULL          | NULL    | NULL    | NULL                      | 100000 | Parent of 3 pushed join@1; Using where with pushed condition |
|  1 | SIMPLE      | department | eq_ref | PRIMARY       | PRIMARY | 4       | clusterdb.subs.dept       |      1 | Child of pushed join@1                                       |
|  1 | SIMPLE      | roles      | eq_ref | PRIMARY       | PRIMARY | 4       | clusterdb.department.name |      1 | Child of pushed join@1                                       |
+----+-------------+------------+--------+---------------+---------+---------+---------------------------+--------+--------------------------------------------------------------+

and then re-run the query:

mysql> SELECT count(*) FROM subs, department, roles WHERE subs.country=44 AND department.id=subs.dept AND roles.dept=department.name;
+----------+
| count(*) |
+----------+
|    33334 |
+----------+
1 row in set (0.77 sec)

In this test, the query ran almost 12x faster!

Running the tests (Config 1 – separate mysqld)

The test was then repeated with the MySQL Server running within a VM on a 3rd host – the purpose of this is to represent the more normal configuration where the MySQL servers must communicate over the network to the data nodes. As the purpose of SPJ is to reduce the messaging between the MySQL Server and the data nodes, it’s reasonable to expect the benefits from SPJ to be more pronounced with this configuration.

Again, to get a baseline, ensure that SPJ is turned off:

mysql> set ndb_join_pushdown=off;

and then get the output from EXPLAIN:

mysql> EXPLAIN SELECT count(*) FROM subs, department, roles WHERE subs.country=44 AND department.id=subs.dept AND roles.dept=department.name;
+----+-------------+------------+--------+---------------+---------+---------+---------------------------+--------+-----------------------------------+
| id | select_type | table      | type   | possible_keys | key     | key_len | ref                       | rows   | Extra                             |
+----+-------------+------------+--------+---------------+---------+---------+---------------------------+--------+-----------------------------------+
|  1 | SIMPLE      | subs       | ALL    | NULL          | NULL    | NULL    | NULL                      | 100000 | Using where with pushed condition |
|  1 | SIMPLE      | department | eq_ref | PRIMARY       | PRIMARY | 4       | clusterdb.subs.dept       |      1 |                                   |
|  1 | SIMPLE      | roles      | eq_ref | PRIMARY       | PRIMARY | 4       | clusterdb.department.name |      1 |                                   |
+----+-------------+------------+--------+---------------+---------+---------+---------------------------+--------+-----------------------------------+

and then execute the query:

mysql> SELECT count(*) FROM subs, department, roles WHERE subs.country=44 AND department.id=subs.dept AND roles.dept=department.name;
+----------+
| count(*) |
+----------+
|    33334 |
+----------+
1 row in set (1 min 2.12 sec)

Now to see the benefits of SPJ, turn it back on:

mysql> set ndb_join_pushdown=on;

Check the output from EXPLAIN again:

mysql> EXPLAIN SELECT count(*) FROM subs, department, roles WHERE subs.country=44 AND department.id=subs.dept AND roles.dept=department.name;
+----+-------------+------------+--------+---------------+---------+---------+---------------------------+--------+--------------------------------------------------------------+
| id | select_type | table      | type   | possible_keys | key     | key_len | ref                       | rows   | Extra                                                        |
+----+-------------+------------+--------+---------------+---------+---------+---------------------------+--------+--------------------------------------------------------------+
|  1 | SIMPLE      | subs       | ALL    | NULL          | NULL    | NULL    | NULL                      | 100000 | Parent of 3 pushed join@1; Using where with pushed condition |
|  1 | SIMPLE      | department | eq_ref | PRIMARY       | PRIMARY | 4       | clusterdb.subs.dept       |      1 | Child of pushed join@1                                       |
|  1 | SIMPLE      | roles      | eq_ref | PRIMARY       | PRIMARY | 4       | clusterdb.department.name |      1 | Child of pushed join@1                                       |
+----+-------------+------------+--------+---------------+---------+---------+---------------------------+--------+--------------------------------------------------------------+

and then re-run the query:

mysql> SELECT count(*) FROM subs, department, roles WHERE subs.country=44 AND department.id=subs.dept AND roles.dept=department.name;
+----------+
| count(*) |
+----------+
|    33334 |
+----------+
1 row in set (1.26 sec)

In this test, the query ran almost 50x faster!

Do all queries benefit from SPJ

No and that’s why it’s especially important to get feedback from real users with representative schemas so that SPJ can be extended to cover as many of the significant use cases as possible.

As an example, using the following query I saw no speedup at all (using the local mysqld configuration):

mysql> set ndb_join_pushdown=off;

mysql> EXPLAIN SELECT count(*) FROM subs, department, roles WHERE subs.country=44 AND subs.dept=department.name AND department.id=roles.dept;
+----+-------------+------------+--------+---------------+---------+---------+-------------------------+--------+-----------------------------------+
| id | select_type | table      | type   | possible_keys | key     | key_len | ref                     | rows   | Extra                             |
+----+-------------+------------+--------+---------------+---------+---------+-------------------------+--------+-----------------------------------+
|  1 | SIMPLE      | subs       | ALL    | NULL          | NULL    | NULL    | NULL                    | 100000 | Using where with pushed condition |
|  1 | SIMPLE      | department | ALL    | PRIMARY       | NULL    | NULL    | NULL                    | 100000 | Using where; Using join buffer    |
|  1 | SIMPLE      | roles      | eq_ref | PRIMARY       | PRIMARY | 4       | clusterdb.department.id |      1 |                                   |
+----+-------------+------------+--------+---------------+---------+---------+-------------------------+--------+-----------------------------------+

mysql> SELECT count(*) FROM subs, department, roles WHERE subs.country=44 AND subs.dept=department.name AND department.id=roles.dept;
+----------+
| count(*) |
+----------+
|    33334 |
+----------+
1 row in set (3 min 56.26 sec)
mysql> set ndb_join_pushdown=on;
mysql> EXPLAIN SELECT count(*) FROM subs, department, roles WHERE subs.country=44 AND subs.dept=department.name AND department.id=roles.dept;
+----+-------------+------------+--------+---------------+---------+---------+-------------------------+--------+-----------------------------------------------------------+
| id | select_type | table      | type   | possible_keys | key     | key_len | ref                     | rows   | Extra                                                     |
+----+-------------+------------+--------+---------------+---------+---------+-------------------------+--------+-----------------------------------------------------------+
|  1 | SIMPLE      | subs       | ALL    | NULL          | NULL    | NULL    | NULL                    | 100000 | Using where with pushed condition                         |
|  1 | SIMPLE      | department | ALL    | PRIMARY       | NULL    | NULL    | NULL                    | 100000 | Parent of 2 pushed join@1; Using where; Using join buffer |
|  1 | SIMPLE      | roles      | eq_ref | PRIMARY       | PRIMARY | 4       | clusterdb.department.id |      1 | Child of pushed join@1                                    |
+----+-------------+------------+--------+---------------+---------+---------+-------------------------+--------+-----------------------------------------------------------+

mysql> SELECT count(*) FROM subs, department, roles WHERE subs.country=44 AND subs.dept=department.name AND department.id=roles.dept;
+----------+
| count(*) |
+----------+
|    33334 |
+----------+
1 row in set (3 min 57.76 sec)




Free webinar – learn about MySQL Cluster 7.1

MySQL Cluster 7.1 was declared GA earlier this month and today (29 April) you have the chance to learn all about it by registering for this free webinar.

In blazing speed we will cover the most important features of MySQL Cluster 7.1: NDB$INFO; MySQL Cluster Connector/Java and other features that push the limits of MySQL Cluster into new workloads and communities.

NDB$INFO presents real-time usage statistics from the MySQL Cluster data nodes as a series of SQL tables, enabling developers and administrators to monitor database performance and optimize their applications.

Designed for Java developers, the MySQL Cluster Connector for Java implements an easy-to-use and high performance native Java interface and OpenJPA plug-in that maps Java classes to tables stored in the MySQL Cluster database.

It’s worth registering even if you can’t attend as  you should then receive a link to the replay and the charts.

It starts at 9:00 Pacific / 5 pm UK / 6pm CET.





MySQL Cluster 7.1 is GA

MySQL Cluster 7.1 has been declared GA – including MySQL Cluster Connector for Java and MySQL Cluster Manager – see http://www.mysql.com/products/database/cluster/ for details.





Charts from LDAP Con on LDAP access to MySQL Cluster

At last year’s LDAP-Con event, Ludo from OpenDS and Howard from OpenLDAP presented on the work that they’d done on using MySQL Cluster as the scalable, real-time data store for LDAP directories (going directly to the NDB API rather than using SQL). Symas now provide their implementation (back-ndb) for OpenLDAP.

You can view the charts at http://www.mysql.com/customers/view/?id=1041





Using ClusterJPA (part of MySQL Cluster Connector for Java) – a tutorial

Fig. 1 Java access to MySQL Cluster

This is a follow up to the earlier post Using ClusterJ (part of MySQL Cluster Connector for Java) – a tutorial but covers the ClusterJPA interface rather than ClusterJ.

JPA is the Java standard for persistence and different vendors can implement their own implementation of this API and they can (and do) add proprietary extensions. Three of the most common implementations are OpenJPA, Hibernate and Toplink. JPA can be used within server containers or outside of them (i.e. with either J2EE or J2SE).

Typically a JPA implementation would access the database (for example, MySQL Cluster) using JDBC. JDBC gives a great deal of flexibility to the JPA implementer but it cannot give the best performance when using MySQL Cluster as there is an internal conversion to SQL by Connector/J and a subsequent translation from SQL to the C++ NDB API by the MySQL Server. As of MySQL Cluster 7.1, OpenJPA can be configured to use the high performance NDB API (via ClusterJ) for most operations but fall back on JDBC for more complex queries.

The first implementation of ClusterJPA is as an OpenJPA BrokerFactory but in the future, it may be extended to work with other JPA implementations.

ClusterJPA overcomes ClusterJ limitations, notably:

  • Persistent classes
  • Relationships
  • Joins in queries
  • Lazy loading
  • Table and index creation from object model

Fig.2 ClusterJPA Performance

Typically users base their selection of a JPA solution on factors such as proprietary extensions, what existing applications already use and (increasingly with ClusterJPA) performance.

The performance of ClusterJPA (OpenJPA using ClusterJ) has been compared with OpenJPA using JDBC in Figure 2. It should be noted that the performance is significantly better when using ClusterJPA (the yellow bar). It is hoped that in the future the performance can be improved even further for finds, updates and deletes.

Adapting an OpenJPA based application to use ClusterJPA with MySQL Cluster should be fairly straight-forward with the main change being in the definition of the persistence unit in persistence.xml:

<persistence xmlns=http://java.sun.com/xml/ns/persistence xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.0">
 <persistence-unit name="clusterdb" transaction-type="RESOURCE_LOCAL“>
  <provider> org.apache.openjpa.persistence.PersistenceProviderImpl </provider>
  <class>Employee</class>
  <class>Department</class>
  <properties>
   <property name="openjpa.jdbc.SynchronizeMappings" value="buildSchema" />
   <property name="openjpa.ConnectionDriverName"
    value="com.mysql.jdbc.Driver" />
   <property name="openjpa.ConnectionURL" value="jdbc:mysql://localhost:3306/clusterdb" />
   <property name="openjpa.ConnectionUserName" value="root" />
   <property name="openjpa.ConnectionPassword" value="" />
   <property name="openjpa.BrokerFactory" value="ndb" />
   <property name="openjpa.jdbc.DBDictionary" value="TableType=ndbcluster" />
   <property name="openjpa.ndb.connectString" value="localhost:1186" />
   <property name="openjpa.ndb.database" value="clusterdb" /
  </properties>
 </persistence-unit>
</persistence>

Fig. 3 ClusterJPA Annotations

Defining the object-to-table mappings is performed by annotating the persistent class for the domain object. If not already in existence, OpenJPA will create the table. The property  openjpa.jdbc.DBDictionary tells OpenJPA to create the tables using ndb as the storage engine.

This paper does not go into the use of JPA in great depth – focusing instead on the specifics of using OpenJPA with MySQL Cluster/ClusterJPA. For more information on the use of JPA and OpenJPA, refer to http://openjpa.apache.org/ and in particular, http://openjpa.apache.org/builds/latest/docs/manual/manual.html

The tutorials are using MySQL Cluster 7.1.2a on Fedora 12. If using earlier or more recent versions of MySQL Cluster then you may need to change the class-paths as explained in http://dev.mysql.com/doc/ndbapi/en/mccj-using-jpa.html

For this tutorial, it is necessary to have MySQL Cluster up and running. For simplicity all of the nodes (processes) making up the Cluster will be run on the same physical host, along with the application.

Although most of the database access is performed through the NDB API, the Cluster includes a MySQL Server process for OpenJPA to use for complex queries and to allow the user to check the contents of the database manually.

These are the MySQL Cluster configuration files being used :

config.ini:

[ndbd default]
noofreplicas=2
datadir=/home/billy/mysql/my_cluster/data

[ndbd]
hostname=localhost
id=3

[ndbd]
hostname=localhost
id=4

[ndb_mgmd]
id = 1
hostname=localhost
datadir=/home/billy/mysql/my_cluster/data

[mysqld]
hostname=localhost
id=101

[api]
hostname=localhost

my.cnf:

[mysqld]
ndbcluster
datadir=/home/billy/mysql/my_cluster/data
basedir=/usr/local/mysql

This tutorial focuses on ClusterJPA rather than on running MySQL Cluster; if you are new to MySQL Cluster then refer to Running a simple Cluster before trying these tutorials.

JPA/OpenJPA/ClusterJPA can be used within or outside a container (i.e. it can be used with J2EE or J2SE) – for simplicity, this tutorial does not use a container (i.e. it is written using J2SE).

Before being able to run any ClusterJPA code, you first need to download and install OpenJPA from http://openjpa.apache.org/ – this tutorial uses OpenJPA 1.2.1. Simply extract the contents of the binary tar ball to the host you want to run your application on; for this tutorial, I use /usr/local/openjpa.

Additionally, ClusterJPA must sometimes use JDBC to satisfy certain queries and so “JDBC Driver for MySQL (Connector/J)” should also be installed – this can be downloaded from http://dev.mysql.com/downloads/connector/j/ Again, simply extract the contents of the tar ball, for this tutorial the files are stored in /usr/local/connectorj and version 5.1.12 is used.

If the ClusterJ tutorial has already been run on this MySQL Cluster database then drop the tables from the cluster so that you can observe them being created automatically – though in a real application, you may prefer to create them manually.

A configuration file is required to indicate how persistence is to be handled for the application. Create a new directory called META-INF in the application source directory and within there create a file called persistence.xml:

<persistence xmlns="http://java.sun.com/xml/ns/persistence"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.0">
 <persistence-unit name="clusterdb" transaction-type="RESOURCE_LOCAL">
 <provider>
 org.apache.openjpa.persistence.PersistenceProviderImpl
 </provider>
 <class>Employee</class>
 <class>Department</class>
 <properties>
 <property name="openjpa.jdbc.SynchronizeMappings" value="buildSchema" />
 <property name="openjpa.ConnectionDriverName"
 value="com.mysql.jdbc.Driver" />
 <property name="openjpa.ConnectionURL"
 value="jdbc:mysql://localhost:3306/clusterdb" />
 <property name="openjpa.ConnectionUserName" value="root" />
 <property name="openjpa.ConnectionPassword" value="" />
 <property name="openjpa.BrokerFactory" value="ndb" />
 <property name="openjpa.jdbc.DBDictionary" value="TableType=ndb"/>
 <property name="openjpa.ndb.connectString" value="localhost:1186" />
 <property name="openjpa.ndb.database" value="clusterdb" />
 </properties>
 </persistence-unit>
</persistence>

A persistence unit called ‘clusterdb’ is created; the provider (implementation for the persistence) is set to openjpa (as opposed for example to hibernate). Two classes are specified – ‘Employee’ and ‘Department’ which relate to the persistent classes that the application will define. Connector/J is defined as the JDBC connection (together with the host and the port of the MySQL Server to be used). The key to having OpenJPA use ClusterJPA is to set the BrokerFactory to ndb and specify the connect string (host:port) for the MySQL Cluster management node. The database is defined to be ‘clusterdb’ for both the JDBC and ClusterJ connections. The engine type when creating tables is set to ndb.

If not already done so, create the ‘clusterdb’ database (if it already contains tables from the ClusterJ tutorial then drop them):

mysql> create database clusterdb;

The next step is to create the persistent class definitions for the Department and Employee Entities:

Department.java:

import javax.persistence.*;

@Entity(name = "department")
public class Department {
  private int Id;
  private String Site;

  public Department(){}

  @Id public int getId() {return Id;}
  public void setId(int id) {Id=id;}

  @Column(name="location")    
  public String getSite() {return Site;}
  public void setSite(String site) {Site=site;}

  public String toString() {
  return "Department: " + getId() + " based in " + getSite();
 }
}

Using the @Entity tag, the table name is specified to be ‘department’. Note that unlike ClusterJ, ClusterJPA uses persistent classes (rather than interfaces) and so it is necessary to define the properties as well as the getter/setter methods. The primary key is defined using the @Id tag and we specify that the column associated with the Site property should be called ‘location’ using the @Column tag.

As this is a class, it is possible to add other useful methods – in this case toString().

Employee.java:

import javax.persistence.*;
@Entity(name = "employee") //Name of the table
public class Employee {
 private int Id;
 private String First;
 private String Last;
 private String City;
 private String Started;  
 private String Ended;  
 private int Department;

 public Employee(){}

 @Id public int getId() {return Id;}
 public void setId(int id) {Id=id;}

 public String getFirst() {return First;}
 public void setFirst(String first) {First=first;}

 public String getLast() {return Last;}
 public void setLast(String last) {Last=last;}

 @Column(name="municipality")  
 public String getCity() {return City;}
 public void setCity(String city) {City=city;}

 public String getStarted() {return Started;}
 public void setStarted(String date) {Started=date;}

 public String getEnded() {return Ended;}
 public void setEnded(String date) {Ended=date;}

 public int getDepartment() {return Department;}
 public void setDepartment(int department) {Department=department;}

 public String toString() {
  return getFirst() + " " + getLast() + " (Dept " +
  getDepartment()+ ") from " + getCity() +
  " started on " + getStarted() + " & left on " + getEnded();
 }
}

The next step is to write the application code which we step through here block by block; the first of which simply contains the import statements and then:

Main.java (part 1):

import java.util.List;
import javax.persistence.EntityManager;
import javax.persistence.EntityManagerFactory;
import javax.persistence.EntityTransaction;
import javax.persistence.Persistence;
import javax.persistence.Query;
import java.io.*;
public class Main {
public static void main (String[] args) throws java.io.IOException {
 EntityManagerFactory entityManagerFactory = Persistence.createEntityManagerFactory("clusterdb");
 EntityManager em = entityManagerFactory.createEntityManager();
 EntityTransaction userTransaction = em.getTransaction();
 BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
 System.out.println("The tables will now have been created - check through SQL.");
 System.out.println("mysql> use clusterdb;");
 System.out.println("mysql> show tables;");
 System.out.println("Hit return when you are done");
 String ignore = br.readLine();

As part of creating the EntityManagerFactory and EntityManager, OpenJPA creates the tables for the two classes specified for the ‘clusterdb’ persistence unit. While the application waits for the user to press return, this can be checked:

mysql> use clusterdb
 mysql> show tables;
 +---------------------+
 | Tables_in_clusterdb |
 +---------------------+
 | department          |
 | employee            |
 +---------------------+

After hitting return, the application can create an Employee object and then persist it – at which point it will be stored in the ‘employee’ table. A second Employee object is then created and populated with the data read back from the database (using a primary key look up on the Id property with a value of 1):

Main.java (part 2):

 userTransaction.begin();
 Employee emp = new Employee();
 emp.setId(1);
 emp.setDepartment(666);
 emp.setFirst("Billy");
 emp.setLast("Fish");
 emp.setStarted("1st February 2009");
 em.persist(emp);
 userTransaction.commit();
 userTransaction.begin();
 Employee theEmployee = em.find(Employee.class, 1);
 userTransaction.commit();
 System.out.println(theEmployee.toString());
 System.out.println("Chance to check the database before City is set");
 System.out.println("Hit return when you are done");
 ignore = br.readLine();

The Employee object read back from the database is displayed:

Billy Fish (Dept 666) from null started on 1st February 2009 & left on null
Chance to check the database before City is set
Hit return when you are done

At this point, the application waits to give the user a chance to confirm that the Employee really has been written to the database:

mysql> select * from employee;
+----+--------------+------------+-------+-------+------+-------------------+
| id | municipality | department | ended | first | last | started           |
+----+--------------+------------+-------+-------+------+-------------------+
|  1 | NULL         |        666 | NULL  | Billy | Fish | 1st February 2009 |
+----+--------------+------------+-------+-------+------+-------------------+

After hitting return, the application continues and an update is made to the persisted Employee object – note that there is no need to explicitly ask for the changes to be persisted, this happens automatically when the transaction is committed:

Main.java (part 3):

 userTransaction.begin();
 theEmployee.setCity("London");
 theEmployee.setDepartment(777);
 userTransaction.commit();
 System.out.println("Chance to check the City is set in the database");
 System.out.println("Hit return when you are done");
 ignore = br.readLine();

At this point, the application again waits while the user has a chance to confirm that the changes did indeed get written through to the database:

mysql> select * from employee;
+----+--------------+------------+-------+-------+------+-------------------+
| id | municipality | department | ended | first | last | started           |
+----+--------------+------------+-------+-------+------+-------------------+
|  1 | London       |        777 | NULL  | Billy | Fish | 1st February 2009 |
+----+--------------+------------+-------+-------+------+-------------------+

When allowed to continue, the application creates and persists an additional 100 Employee & Department entities. It then goes on to create and execute a query to find all employees with a department number of 777 and then looks up the location of the site for that department.

Main.java (part 4):

 Department dept;
 userTransaction.begin();
 for (int i=700;i<800;i++) {
  emp = new Employee();
  dept = new Department();
  emp.setId(i+1000);
  emp.setDepartment(i);
  emp.setFirst("Billy");
  emp.setLast("No-Mates-"+i);
  emp.setStarted("1st February 2009");
  em.persist(emp);
  dept.setId(i);
  dept.setSite("Building-"+i);
  em.persist(dept);
 }
 userTransaction.commit();
 userTransaction.begin();
 Query q = em.createQuery("select x from Employee x where x.department=777");
 Query qd;
 for (Employee m : (List<Employee>) q.getResultList()) {
  System.out.println(m.toString());
  qd = em.createQuery("select x from Department x where x.id=777");
  for (Department d : (List<Department>) qd.getResultList()) {
   System.out.println(d.toString());
  }
 }
 userTransaction.commit();

These are the results displayed:

Billy No-Mates-777 (Dept 777) from null started on 1st February 2009 & left on null
Department: 777 based in Building-777
Billy Fish (Dept 777) from London started on 1st February 2009 & left on null
Department: 777 based in Building-777

Note that joins between tables are possible with JPA but that is beyond the scope of this tutorial.

Finaly, the EntityManager and EntityManagerFactory are closed:

Main.java (part 5):

  em.close();
  entityManagerFactory.close();
 }
}

Compiling and running the ClusterJPA tutorial code

javac -classpath /usr/local/mysql/share/mysql/java/clusterjpa.jar:/usr/local/openjpa/openjpa-1.2.1.jar:/usr/local/openjpa/lib/geronimo-jpa_3.0_spec-1.0.jar:. Main.java Employee.java Department.java
java -Djava.library.path=/usr/local/mysql/lib/ -classpath /usr/local/mysql/share/mysql/java/clusterjpa.jar:/usr/local/openjpa/openjpa-1.2.1.jar:/usr/local/openjpa/lib/*:/usr/local/connectorj/mysql-connector-java-5.1.12-bin.jar:. Main 

Download the source code for this tutorial from here (together with the code for the previous ClusterJ tutorial).





Using ClusterJ (part of MySQL Cluster Connector for Java) – a tutorial

Fig. 1 Java access to MySQL Cluster

ClusterJ is part of the MySQL Cluster Connector for Java which is currently in beta as part of MySQL Cluster 7.1. It is designed to provide a high performance method for Java applications to store and access data in a MySQL Cluster database. It is also designed to be easy for Java developers to use and is “in the style of” Hibernate/Java Data Objects (JDO) and JPA. It uses the Domain Object Model DataMapper pattern:

  • Data is represented as domain objects
  • Domain objects are separate from business logic
  • Domain objects are mapped to database tables

The purpose of ClusterJ is to provide a mapping from the table-oriented view of the data stored in MySQL Cluster to the Java objects used by the application. This is achieved by annotating interfaces representing the Java objects; where each persistent interface is mapped to a table and each property in that interface to a column. By default, the table name will match the interface name and the column names match the property names but this can be overridden using annotations.

Fig. 2 ClusterJ Interface Annotations

If the table does not already exist (for example, this is a brand new application with new data) then the table must be created manually – unlike OpenJPA, ClusterJ will not create the table automatically.

Figure 2 shows an example of an interface that has been created in order to represent the data held in the ‘employee’ table.

ClusterJ uses the following concepts:

  • Fig. 3 ClusterJ Terminology

    SessionFactory: There is one instance per MySQL Cluster instance for each Java Virtual Machine (JVM). The SessionFactory object is used by the application to get hold of sessions. The configuration details for the ClusterJ instance are defined in the Configuration properties which is an artifact associated with the SessionFactory.

  • Session: There is one instance per user (per Cluster, per JVM) and represents a Cluster connection
  • Domain Object: Objects representing the data from a table. The domain objects (and their relationships to the Cluster tables) are defined by annotated interfaces (as shown in the right-hand side of Figure 2.
  • Transaction: There is one transaction per session at any point in time. By default, each operation (query, insert, update, or delete) is run under a new transaction. . The Transaction interface allows developers to aggregate multiple operations into a single, atomic unit of work.

ClusterJ will be suitable for many Java developers but it has some restrictions which may make OpenJPA with the ClusterJPA plug-in more appropriate. These ClusterJ restrictions are:

  • Persistent Interfaces rather than persistent classes. The developer provides the signatures for the getter/setter methods rather than the properties and no extra methods can be added.
  • No Relationships between properties or between objects can be defined in the domain objects. Properties are primitive types.
  • No Multi-table inheritance; there is a single table per persistent interface
  • No joins in queries (all data being queried must be in the same table/interface)
  • No Table creation – user needs to create tables and indexes
  • No Lazy Loading – entire record is loaded at one time, including large object (LOBs).

Tutorial

This tutorial uses MySQL Cluster 7.1.2a on Fedora 12. If using earlier or more recent versions of MySQL Cluster then you may need to change the class-paths as explained in http://dev.mysql.com/doc/ndbapi/en/mccj-using-clusterj.html

It is necessary to have MySQL Cluster up and running. For simplicity all of the nodes (processes) making up the Cluster will be run on the same physical host, along with the application.

These are the MySQL Cluster configuration files being used :

config.ini:

[ndbd default]noofreplicas=2
datadir=/home/billy/mysql/my_cluster/data

[ndbd]
hostname=localhost
id=3

[ndbd]
hostname=localhost
id=4

[ndb_mgmd]
id = 1
hostname=localhost
datadir=/home/billy/mysql/my_cluster/data

[mysqld]
hostname=localhost
id=101

[api]
hostname=localhost

my.cnf:

[mysqld]
ndbcluster
datadir=/home/billy/mysql/my_cluster/data
basedir=/usr/local/mysql

This tutorial focuses on ClusterJ rather than on running MySQL Cluster; if you are new to MySQL Cluster then refer to running a simple Cluster before trying this tutorial.

ClusterJ needs to be told how to connect to our MySQL Cluster database; including the connect string (the address/port for the management node), the database to use, the user to login as and attributes for the connection such as the timeout values. If these parameters aren’t defined then ClusterJ will fail with run-time exceptions. This information represents the “configuration properties” shown in Figure 3.  These parameters can be hard coded in the application code but it is more maintainable to create a clusterj.properties file that will be imported by the application. This file should be stored in the same directory as your application source code.

clusterj.properties:

com.mysql.clusterj.connectstring=localhost:1186
 com.mysql.clusterj.database=clusterdb
 com.mysql.clusterj.connect.retries=4
 com.mysql.clusterj.connect.delay=5
 com.mysql.clusterj.connect.verbose=1
 com.mysql.clusterj.connect.timeout.before=30
 com.mysql.clusterj.connect.timeout.after=20
 com.mysql.clusterj.max.transactions=1024

As ClusterJ will not create tables automatically, the next step is to create ‘clusterdb’ database (referred to in clusterj.properties) and the ‘employee’ table:

[billy@ws1 ~]$ mysql -u root -h 127.0.0.1 -P 3306 -u root
 mysql>  create database clusterdb;use clusterdb;
 mysql> CREATE TABLE employee (
 ->     id INT NOT NULL PRIMARY KEY,
 ->     first VARCHAR(64) DEFAULT NULL,
 ->     last VARCHAR(64) DEFAULT NULL,
 ->     municipality VARCHAR(64) DEFAULT NULL,
 ->     started VARCHAR(64) DEFAULT NULL,
 ->     ended  VARCHAR(64) DEFAULT NULL,
 ->     department INT NOT NULL DEFAULT 1,
 ->     UNIQUE KEY idx_u_hash (first,last) USING HASH,
 ->     KEY idx_municipality (municipality)
 -> ) ENGINE=NDBCLUSTER;

The next step is to create the annotated interface:

Employee.java:

import com.mysql.clusterj.annotation.Column;
import com.mysql.clusterj.annotation.Index;
import com.mysql.clusterj.annotation.PersistenceCapable;
import com.mysql.clusterj.annotation.PrimaryKey;
@PersistenceCapable(table="employee")
@Index(name="idx_uhash")
public interface Employee {
@PrimaryKey
int getId();
void setId(int id);
String getFirst();
void setFirst(String first);

String getLast();
void setLast(String last);
@Column(name="municipality")
@Index(name="idx_municipality")
String getCity();
void setCity(String city);
String getStarted();
void setStarted(String date);
String getEnded();
void setEnded(String date);
Integer getDepartment();
void setDepartment(Integer department);
}

The name of the table is specified in the annotation @PersistenceCapable(table=”employee”) and then each column from the employee table has an associated getter and setter method defined in the interface. By default, the property name in the interface is the same as the column name in the table – the column name has been overridden for the City property by explicitly including the @Column(name=”municipality”) annotation just before the associated getter method. The @PrimaryKey annotation is used to identify the property whose associated column is the Primary Key in the table. ClusterJ is made aware of the existence of indexes in the database using the @Index annotation.

The next step is to write the application code which we step through here block by block; the first of which simply contains the import statements and then loads the contents of the clusterj.properties defined above:

Main.java (part 1):

import com.mysql.clusterj.ClusterJHelper;
import com.mysql.clusterj.SessionFactory;
import com.mysql.clusterj.Session;
import com.mysql.clusterj.Query;
import com.mysql.clusterj.query.QueryBuilder;
import com.mysql.clusterj.query.QueryDomainType;
import java.io.File;
import java.io.InputStream;
import java.io.FileInputStream;
import java.io.*;
import java.util.Properties;
import java.util.List;
public class Main {
public static void main (String[] args) throws java.io.FileNotFoundException,java.io.IOException {
// Load the properties from the clusterj.properties file
File propsFile = new File("clusterj.properties");
InputStream inStream = new FileInputStream(propsFile);
Properties props = new Properties();
props.load(inStream);
//Used later to get userinput
BufferedReader br = new BufferedReader(new
InputStreamReader(System.in));

The next step is to get a handle for a SessionFactory from the ClusterJHelper class and then use that factory to create a session (based on the properties imported from clusterj.properties file.

Main.java (part 2):

// Create a session (connection to the database)
SessionFactory factory = ClusterJHelper.getSessionFactory(props);
Session session = factory.getSession();

Now that we have a session, it is possible to instantiate new Employee objects and then persist them to the database. Where there are no transaction begin() or commit() statements, each operation involving the database is treated as a separate transaction.

Main.java (part 3):

// Create and initialise an Employee
Employee newEmployee = session.newInstance(Employee.class);
newEmployee.setId(988);
newEmployee.setFirst("John");
newEmployee.setLast("Jones");
newEmployee.setStarted("1 February 2009");
newEmployee.setDepartment(666);
// Write the Employee to the database
session.persist(newEmployee);

At this point, a row will have been added to the ‘employee’ table. To verify this, a new Employee object is created and used to read the data back from the ‘employee’ table using the primary key (Id) value of 998:

Main.java (part 4):

// Fetch the Employee from the database
 Employee theEmployee = session.find(Employee.class, 988);
if (theEmployee == null)
 {System.out.println("Could not find employee");}
else
 {System.out.println ("ID: " + theEmployee.getId() + "; Name: " +
 theEmployee.getFirst() + " " + theEmployee.getLast());
 System.out.println ("Location: " + theEmployee.getCity());
 System.out.println ("Department: " + theEmployee.getDepartment());
 System.out.println ("Started: " + theEmployee.getStarted());
 System.out.println ("Left: " + theEmployee.getEnded());
}

This is the output seen at this point:

ID: 988; Name: John Jones
Location: null
Department: 666
Started: 1 February 2009
Left: null
Check the database before I change the Employee - hit return when you are done

The next step is to modify this data but it does not write it back to the database yet:

Main.java (part 5):

// Make some changes to the Employee & write back to the database
theEmployee.setDepartment(777);
theEmployee.setCity("London");
System.out.println("Check the database before I change the Employee -
hit return when you are done");
String ignore = br.readLine();

The application will pause at this point and give you chance to check the database to confirm that the original data has been added as a new row but the changes have not been written back yet:

mysql> select * from clusterdb.employee;
+-----+-------+-------+--------------+-----------------+-------+------------+
| id  | first | last  | municipality | started         | ended | department |
+-----+-------+-------+--------------+-----------------+-------+------------+
| 988 | John  | Jones | NULL         | 1 February 2009 | NULL  |        666 |
+-----+-------+-------+--------------+-----------------+-------+------------+

After hitting return, the application will continue and write the changes to the table, using an automatic transaction to perform the update.

Main.java (part 6):

session.updatePersistent(theEmployee);
System.out.println("Check the change in the table before I bulk add
Employees - hit return when you are done");
ignore = br.readLine();

The application will again pause so that we can now check that the change has been written back (persisted) to the database:

mysql> select * from clusterdb.employee;
+-----+-------+-------+--------------+-----------------+-------+------------+
| id  | first | last  | municipality | started         | ended | department |
+-----+-------+-------+--------------+-----------------+-------+------------+
| 988 | John  | Jones | London       | 1 February 2009 | NULL  |        777 |
+-----+-------+-------+--------------+-----------------+-------+------------+

The application then goes onto create and persist 100 new employees. To improve performance, a single transaction is used to that all of the changes can be written to the database at once when the commit() statement is run:

Main.java (part 7):

// Add 100 new Employees - all as part of a single transaction
 newEmployee.setFirst("Billy");
 newEmployee.setStarted("28 February 2009");
session.currentTransaction().begin();
for (int i=700;i<800;i++) {
 newEmployee.setLast("No-Mates"+i);
 newEmployee.setId(i+1000);
 newEmployee.setDepartment(i);
 session.persist(newEmployee);
 }
session.currentTransaction().commit();

The 100 new employees will now have been persisted to the database. The next step is to create and execute a query that will search the database for all employees in department 777 by using a QueryBuilder and using that to build a QueryDomain that compares the ‘department’ column with a parameter. After creating the, the department parameter is set to 777 (the query could subsequently be reused with different department numbers). The application then runs the query and iterates through and displays each of employees in the result set:

Main.java (part 8):

// Retrieve the set all of Employees in department 777
QueryBuilder builder = session.getQueryBuilder();
QueryDomainType<Employee> domain =
builder.createQueryDefinition(Employee.class);
domain.where(domain.get("department").equal(domain.param(
"department")));
Query<Employee> query = session.createQuery(domain);
query.setParameter("department",777);
List<Employee> results = query.getResultList();
for (Employee deptEmployee: results) {
System.out.println ("ID: " + deptEmployee.getId() + "; Name: " +
deptEmployee.getFirst() + " " + deptEmployee.getLast());
System.out.println ("Location: " + deptEmployee.getCity());
System.out.println ("Department: " + deptEmployee.getDepartment());
System.out.println ("Started: " + deptEmployee.getStarted());
System.out.println ("Left: " + deptEmployee.getEnded());
}
System.out.println("Last chance to check database before emptying table
- hit return when you are done");
ignore = br.readLine();

At this point, the application will display the following and prompt the user to allow it to continue:

ID: 988; Name: John Jones
Location: London
Department: 777
Started: 1 February 2009
Left: null
ID: 1777; Name: Billy No-Mates777
Location: null
Department: 777
Started: 28 February 2009
Left: null

We can compare that output with an SQL query performed on the database:

mysql> select * from employee where department=777;
 +------+-------+-------------+--------------+------------------+-------+------------+
 | id   | first | last        | municipality | started          | ended | department |
 +------+-------+-------------+--------------+------------------+-------+------------+
 |  988 | John  | Jones       | London       | 1 February 2009  | NULL  |        777 |
 | 1777 | Billy | No-Mates777 | NULL         | 28 February 2009 | NULL  |        777 |
 +------+-------+-------------+--------------+------------------+-------+------------+

Finally, after pressing return again, the application will remove all employees:

Main.java (part 9):

session.deletePersistentAll(Employee.class);
 }
}

As a final check, an SQL query confirms that all of the rows have been deleted from the ‘employee’ table.

mysql> select * from employee;
Empty set (0.00 sec)

Compiling and running the ClusterJ tutorial code

javac -classpath /usr/local/mysql/share/mysql/java/clusterj-api.jar:. Main.java Employee.java
java -classpath /usr/local/mysql/share/mysql/java/clusterj.jar:. -Djava.library.path=/usr/local/mysql/lib Main
 

Download the source code for this tutorial from here (together with the code for the up-coming ClusterJPA tutorial).





MySQL Cluster 7.1.2a binaries released

The binary version for MySQL Cluster 7.1.2a has now been made available at http://dev.mysql.com/downloads/cluster/ under the Development tab.

Note that this beta load contains the latest NDBINFO and MySQL Cluster Connector for Java (ClusterJ) enhancements – please try them out and provide feedback (any bugs should be reported through bugs.mysql.com.

A description of all of the changes (fixes) that have gone into MySQL Cluster 7.1.2a (compared to 7.1.1) can be found in the MySQL Cluster 7.1.2a Change Log.





Build MySQL Cluster 7.1 from source – including MySQL Cluster Connector for Java

If you want to try out the beta features in MySQL Cluster 7.1 then you can either use the appropriate binaries or you can build it for yourself from source. Here I explain how to do this on LINUX.

Note that if you want to make use of OpenJPA then you first need to install OpenJPA and Connector/J.

The example here was on Fedora12 with the MySQL Cluster 7.1.2 source:

CFLAGS=”-O3″ CXX=gcc CXXFLAGS=”-O3 -felide-constructors -fno-exceptions -fno-rtti” ./configure -prefix=/usr/local/mysql –enable-assembler –with-mysqld-ldflags=-all-static –with-plugins=max –with-openjpa –with-classpath=/usr/local/openjpa/openjpa-1.2.1.jar:/usr/local/openjpa/lib/geronimo-jpa_3.0_spec-1.0.jar:/usr/local/openjpa/lib/geronimo-jta_1.1_spec-1.1.jar  –with-extra-charsets=all

make

make install

That’s it! Obviously, the exact location of the OpenJPA jars will depend on where you installed it. Note that for ‘make install’ you need to run it from an account that has access to /usr/local

I’ll follow up a little later with a post with example applications (in the mean time refer to this tutorial or the MySQL Cluster for Java on-line documentation) but FYI these are the options I use to compile and run my test aps:

ClusterJ:

javac -classpath /usr/local/mysql/share/mysql/java/clusterj-api.jar:. Main.java Employee.java

java -classpath /usr/local/mysql/share/mysql/java/clusterj.jar:. -Djava.library.path=/usr/local/mysql/lib/mysql/ Main


ClusterJPA:

javac -classpath /usr/local/mysql/share/mysql/java/clusterjpa.jar:/usr/local/openjpa/openjpa-1.2.1.jar:/usr/local/openjpa/lib/geronimo-jpa_3.0_spec-1.0.jar:. Main.java Employee.java Department.java

java -Djava.library.path=/usr/local/mysql/lib/mysql/ -classpath /usr/local/mysql/share/mysql/java/clusterjpa.jar:/usr/local/openjpa/openjpa-1.2.1.jar:/usr/local/openjpa/lib/*:/usr/local/connectorj/mysql-connector-java-5.1.12-bin.jar:. Main





MySQL Cluster 7.1.2 beta binaries released

The binary version for MySQL Cluster 7.1.2 has now been made available at http://dev.mysql.com/downloads/cluster/ under the Development tab.

Note that this beta load contains the latest NDBINFO and MySQL Cluster Connector for Java (ClusterJ) enhancements – please try them out and provide feedback (any bugs should be reported through bugs.mysql.com.

A description of all of the changes (fixes) that have gone into MySQL Cluster 7.1.2 (compared to 7.1.2) can be found in the MySQL Cluster 7.1.2 Change Log.