This article goes along similar lines to the piece I did on the security of MySQL. I came across another blog that asked about programming tips that really should not be followed in the real world. I thought it might be interesting to get some ideas of what areas of MySQL or database theory that is taught as "religion" at University or even in starting positions that just don't work in the real world in some situations.
I will get the ball rolling by adding a few points that are often drummed into people, but don't work in all situations.
1. Normalisation: This is one that is always pushed right from the start and does offer benefits in many situations. However, when you start to look at much larger datastores such as data warehousing, reporting and analytics and streaming data, normalisation will often become a hinderence to the working of the database. Joing multiple tables that can be hundreds of millions of records each is not a good option for performance, so denormalisation is considered as a viable alternative in these cases.
2. Use the Query Cache: Caching is often seen as a great performance improvement and is considered as one feature that should always be one. However, with the MySQL query cache, the performance benefits can vary depending on the data set that is being used and the type of queries that are commonly run. For example, having a query cache active for data warehousing is not good as it attempts to fill the cache, gets too full and aborts to fetch from the table anyway. It is much easier to turn off the query cache and access the table directly the first time. Similarly, if you have a high ratio of updates, the cache will continually be cleared and not usable for real caching benefits.
Are there any other points that people have come across that are often seen as necessary or offer great benefits that in real situations are actually a problem?
"However, when you start to
"However, when you start to look at much larger datastores such as data warehousing, reporting and analytics and streaming data, normalisation will often become a hinderence to the working of the database."
Well, sure, but can you be a little more specific? I always gathered that a typical datawarehouse is a collection of "star" schemas, with one, typically narrow but deep fact table that has references to the rather wide but *relatively* shallow dimension tables. The dimension tables are denormalized, and highly redundant, and can contain aggregates, but the fact table usually does not have redundancy or aggregation.
Is it the denormalisation of these star schema dimension tables you are referring too (in favour of the more normalized dimension tables in "snowflake" schemas)? Or are you recommending to have denormalized fact tables ? That would be very interesting indeed, and I would like hear more about that.