Reply to comment

Okay, obviously we can't

Okay, obviously we can't just group it all together and say all tables in a data warehouse should be denormalised. A "typical" data warehouse is a bit anomalous as there are a number of different models for the data warehouse such as star schemas, but also snowflake schemas. The denormalisation comes into play when looking at how you are defining the fact and the dimension tables.

A simple example is the time dimension. Most data warehouses will have one and the base components are year, month and day. However, we want word names as well as integers, so in normalised form, it would be:

+----------+ +-----------+ +----------+
| Day | | Month | | Year |
+----------+ +-----------+ +----------+
|id | |id | |id |
|day | |month | |year |
|dayname | |monthname | +----------+
+----------+ +-----------+

However, when looking at a time dimension you would have it as:

+----------+
| Time |
+----------+
|id |
|day |
|dayname |
|month |
|monthname |
|year |
+----------+

Why is that? Because there will be many other fields in the time dimension often including quarters, fiscal years, and other specific time periods. Also, have it denormalised mean we add some small overhead in the dimension, but it means joining only one dimension to the fact (assuming simple query here) rather than with 3 separate tables.

As was mentioned, if the dimension is small enough, it may not make that much difference, but it can make a difference.

This is just an example with data warehouses, it becomes even more important when using data streams where any joins can be detrimental to the performance of the system.

The second part of the question was if the denormalisation was to define the use of star schema or snowflake models, that is, do we denormalise the fact table? One of the methods most commonly used in denormalisation is that of aggregation. By adding columns which contains sums, averages, minimums and maximums we are denormalizing the model. This can sometimes be done in the actual dimensions, other times, it may be appropriate to have it in the fact tables. So I guess the overall answer is "maybe"! :)

The main point is that much of what is seen is "rules" is not always 100% applicable to every situation. It is possible to have a data warehouse where you have fully normalised tables, but could it be better with a different model? In many cases you will actually have a combination or even both models for added redundancy. This allows for quick updating into a normalised model and more efficient reporting from the star-based, denormalised model.

So to sum up, look for a design that will best suit your requirements, sometimes it may be fully normalised, sometimes in may be denormalised, or sometimes even both at the same time :)

Reply

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • You may post PHP code. You should include <?php ?> tags.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options