Reply to comment

Hi Jonathon, great post! I

Hi Jonathon,

great post! I think you are absolutely right, many people focus a lot on the data nodes and maximizing redundancy there. For example I have heard several people that use 4 data nodes and 4 replicas "...to be really highly available and fast" and at the same time they have set up only few SQL nodes.

I think that for most setups, 2 replicas is exactly enough, especially if you have multiple nodes per node group. Having multiple node groups per node group is not only a good idea from an HA perspective, but also from a scalability perspective because you will get more partitions that can be processed in parallel. Ideally, the data nodes should be 64 bit machines in order to utilize all the memory you need.

Then you can add SQL nodes (and use cheap 32 bit machines for that) as required to scale for the number of requests, of course up to the point that the network or data nodes are saturated at which point you need to go back and change the back end again. Indeed like you said, you will need to implement something that allows the client applications to choose from this pool of SQL servers in order to provide load balancing and failover for the front end of the cluster. (MySQL Proxy can maybe help here)

What do you think?

kind regards,

Roland

Reply

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • You may post PHP code. You should include <?php ?> tags.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options