Many businesses are now interested in implementing MySQL Cluster as a general high availability solution for their existing databases. This is certainly not a good approach and will often fail due to the limitations currently in the MySQL Cluster implementation (particularly 5.0) and the type of data within the business. However, many are still seeing MySQL Cluster as a magical solution to take out their potential downtime instead of other standard measures such as DRBD, multimaster or scale-out replication.
One of the major issues that most people miss in terms of MySQL Cluster, including those who have planned ahead, is that they focus on the data nodes. This makes sense in many cases since this is where the majority of the capabilities are based for NDB at the backend with the data nodes. Unfortunately it does leave some potential problems for their plans when they realise there is more to the cluster than just the data nodes.
The main nodes neglected are the SQL nodes for accessing the cluster data. Many businesses will look at starting with 1 SQL node as they see this as the minimal amount needed and fits within a defined budget. This may make sense, but it can also underutilise the power of the cluster and actually cause it to perform worse than if it was a single server or master/slave replication setup.
For those who do match the nodes equally, usually 2 data nodes with 2 SQL nodes, they believe that everything will work fine within this cluster. They assume that because the data is managed across the data nodes, the SQL nodes are managed as well. Their reasoning is based on NDB implementing all the high availability within the data node capabilities such as automatic failover to other nodes, data be balanced across the different nodes etc. Their mistake is in thinking that the the SQL nodes are also part of that high availability.
The SQL nodes do not have any form of failover or load balancing that will make the application aware of current problems. Initially, if you want to access both SQL nodes, you will have to addressed individually as separate clients. This may be suitable for an application that is based on a number of major modules and you partition the load between the two SQL nodes. However, most of the time you will be looking at load balancing between the 2 SQL nodes (or more if they exist). There are a number of different ways to implement this load balancing, but that will be another blog post to come :)
The other problem people have with SQL nodes is that of bandwidth and latency. Many people will concentrate on the data nodes and setup high bandwidth such as GigE etc on crossover between the nodes or even SCI or similar to help reduce latency as well. The problem then comes when they realise that they are still using 100Mbit NIC's within the SQL nodes and a bottleneck forms between the backend data nodes and getting the data through the SQL nodes. As with many scenarios, the overall speed of the cluster will default to the slowest part of the implementation.
So if you are looking at MySQL Cluster for a solution, don't just focus on the data nodes, but remember that there are other nodes within the overall cluster setup.