Databases: Normalization or Denormalization. Which is the better technique?

This has really been a long debate as to which approach is more performance orientated, normalized databases or denormalized databases. So this article is a step on my part to figure out the right strategy, because neither one of these approaches can be rejected outright. I will start of by discussing the pros and cons of both the approaches.

Pros and Cons of a Normalized database design.

Normalized databases fair very well under conditions where the applications are write-intensive and the write-load is more than the read-load. This is because of the following reasons:

  • Normalized tables are usually smaller and have a smaller foot-print because the data is divided vertically among many tables. This allows them to perform better as they are small enough to get fit into the buffer.
  • The updates are very fast because the data to be updated is located at a single place and there are no duplicates.
  • Similarly the inserts are very fast because the data has to be inserted at a single place and does not have to be duplicated.
  • The selects are fast in cases where data has to be fetched from a single table, because normally normalized tables are small enough to get fit into the buffer.
  • Because the data is not duplicated so there is less need for heavy duty group by or distinct queries.

Although there seems to be much in favor of normalized tables, with all the pros outlined above, but the main cause of concern with fully normalized tables is that normalized data means joins between tables. And this joining means that read operations have to suffer because indexing strategies do not go well with table joins.

Now lets have a look at the pros and cons of a denormalized database design.

Pros and cons of denormalized database design.

Denormalized databases fair well under heavy read-load and when the application is read intensive. This is because of the following reasons:

  • The data is present in the same table so there is no need for any joins, hence the selects are very fast.
  • A single table with all the required data allows much more efficient index usage. If the columns are indexed properly, then results can be filtered and sorted by utilizing the same index. While in the case of a normalized table, since the data would be spread out in different tables, this would not be possible.

Although for reasons mentioned above selects can be very fast on denormalized tables, but because the data is duplicated, the updates and inserts become complex and costly.

Having said that neither one of the approach can be entirely neglected, because a real world application is going to have both read-loads and write-loads. Hence the correct way would be to utilize both the normalized and denormalized approaches depending on situations.

Using normalized and denormalized approaches together.

The most common way of mixing denormalized and normalized approaches is to duplicate related columns from one table into another table. Let me show you by example:

Suppose you have a products table and an orders table.
The normalized approach would be to only have the product_id in the orders table and all the other product related information in the products table.

But that would make the query that filters by product_name and sorts by order_date inefficient because both are stored in different tables.

In a fully normalized schema, such a query would be performed in the following manner:

SELECT product_name, order_date
FROM orders INNER JOIN products USING(product_id)
WHERE product_name like 'A%'
ORDER by order_date DESC 

As you can see MySQL here will have to scan the order_date index on the orders table and then compare the corresponding product_name in the products table to see if the name starts with A.

The above query can be drastically improved by denormalizing the schema a little bit, so that the orders table now includes the product_name column as well.

SELECT product_name, order_date
FROM orders
WHERE product_name like 'A%'
ORDER by order_date DESC 

See how the query has become much simpler, there is no join now and a single index on columns product_name, order_date can be used to do the filtering as well as the sorting.

So can both the techniques be used together? Yes they can be, because real word applications have a mix of read and write loads.

Final words.

Although, denormalized schema can greatly improve performance under extreme read-loads but the updates and inserts become complex as the data is duplicate and hence has to be updated/inserted in more than one places.

One clean way to go about solving this problem is through the use of triggers. For example in our case where the orders table has the product_name column as well, when the value of product_name has to be updated, then it can simply be done in the following way:

  • Have a trigger setup on the products table that updates the product_name on any update to the products table.
  • Execute the update query on the products table. The data would automatically be updated in the orders table because of the trigger.

However, when denormalizing the schema, do take into consideration, the number of times you would be updating records compared to the number of times you would be executing SELECTs. When mixing normalization and denormalization, focus on denormalizing tables that are read intensive, while tables that are write intensive keep them normalized.

  • Pingback: There Is Low Cost Van Insurance Coverage Available – You Simply Must Look! | Atlanta Auto Insurance

  • Samnan

    Hi Ovais,
    One of the most useful techniques to solve the same issue is to use replication. One copy for read operations while other for write operations not only ensures data backup but also performace for both read/write operations is optimal, if teh replication is configured properly and tables are indexes the right way.

  • Samnan

    Hi Ovais,
    One of the most useful techniques to solve the same issue is to use replication. One copy for read operations while other for write operations not only ensures data backup but also performace for both read/write operations is optimal, if teh replication is configured properly and tables are indexes the right way.

  • http://www.ovaistariq.net/ ovais.tariq

    Replication can improve performance, but the point that I am trying to make here is that, there should be a mix of normalization and denormalization. Even if selects are made on replicated data, still there are going to be table joins in the example that I have given, and joins don’t really go well with indexes. So its best to mix and match.

  • http://www.ovaistariq.net/ ovais.tariq

    Replication can improve performance, but the point that I am trying to make here is that, there should be a mix of normalization and denormalization. Even if selects are made on replicated data, still there are going to be table joins in the example that I have given, and joins don’t really go well with indexes. So its best to mix and match.

  • Greg

    It alls sounds good in theory, but how to actually implement these approaches behind some ORM library, like Hibernate or Active Record?

  • Greg

    It alls sounds good in theory, but how to actually implement these approaches behind some ORM library, like Hibernate or Active Record?

  • Pingback: Tweets that mention Databases: Normalization or Denormalization. Which is the better technique? — tech@ovais.tariq -- Topsy.com

  • http://www.ovaistariq.net/ ovais.tariq

    If we take a look at the definition of Active Record according to Martin Fowler then its as follows:
    “An object that wraps a row in a database table or view, encapsulates the database access, and adds domain logic on that data.”

    So similarly there can be an active record class that maps directly to the denormalized table.

    According to the example of orders table I have given in the post, an active record object would map directly to the row of the order’s table with one exception that the duplicate column “product_name” would only be read only and shouldn’t be updated from the active record class of the orders table.

    Then there would be another active record object that would map directly to a row of the products table.

    The duplicate column updates would be handled by a trigger on the products table as mentioned in my post.

  • http://www.ovaistariq.net/ ovais.tariq

    If we take a look at the definition of Active Record according to Martin Fowler then its as follows:
    “An object that wraps a row in a database table or view, encapsulates the database access, and adds domain logic on that data.”

    So similarly there can be an active record class that maps directly to the denormalized table.

    According to the example of orders table I have given in the post, an active record object would map directly to the row of the order’s table with one exception that the duplicate column “product_name” would only be read only and shouldn’t be updated from the active record class of the orders table.

    Then there would be another active record object that would map directly to a row of the products table.

    The duplicate column updates would be handled by a trigger on the products table as mentioned in my post.

  • Pingback: Tweets that mention Databases: Normalization or Denormalization. Which is the better technique? | ovais.tariq -- Topsy.com

  • http://www.information-technology-career-guide.com information technology

    Great site. A lot of useful information here. I’m sending it to some friends!

  • http://www.information-technology-career-guide.com information technology

    Great site. A lot of useful information here. I’m sending it to some friends!

  • M. Sohaib Shaikh

    As we know that there are several normal forms … so normalized the database up to a level which fits both for read and write operations … replication of data is not the best solution for heavy database ….

  • M. Sohaib shaikh

    Another thing i would like to point out here for the new comers and the readers that the coder/developer should not create/make complex queries … this will increase the load on the database server which reduces the database server performance …. so try to do the work through front end programming language (if using) …. i.e try to bring the load on the client side

  • Pingback: Advanced Article

  • http://www.facebook.com/people/Pavan-Kumar/100001688862556 Pavan Kumar

    thanks for u r informaition

  • http://www.facebook.com/people/Pavan-Kumar/100001688862556 Pavan Kumar

    thanks for u r informaition

  • http://javarevisited.blogspot.com/ Javin Paul

    great article man, you indeed covered topic with details and sample query made my understanding easy and clear. agree nothing is absolutely gem we need to strike a balance between two.

    Javin

    10 basic mysql commands to remember

  • http://www.ovaistariq.net/ Ovais Tariq

    @openid-55717:disqus thanks mate,.

  • Guest

    Spelling correction: Note that “…databases fare well…” (not “fair well”).

  • Pingback: Databases: Normalization or Denormalization. Which is the better technique? | Fredonfire

  • Carol Southern

    Excellent guidelines.  I always wondered about that: which to use.  I never thought of using both, let alone how/when to do it.

  • http://www.van-insurance-cheap.co.uk/ Compare Cheap Van Insurance

    A lot of companies out there will provide you with the so called “best
    deal” upfront, but if you ever have to make a claim the prices will
    shoot through the roof.

  • James Attard

    In practice I use normalized tables in OLTP environments (complex joins should not be a concern for performance if you choose the correct indexes). I use non-normalized tables in datawarehouse environments. (www.jamesattard.com)

  • Guest

    Great post!

  • Oakley vault

    The eyewear brand companies Oakley vault sunglasses mainly for women with a couple of designs in the actual mens category. Known because of its colorful and unique sets, Oakley vault are easily recognizable using the embossed logo on the actual temples. The exquisite design is from the frames range from Oakley sunglasses outlet to classic aviator tones, with pastel and earthy colored lenses. The variety of colors the brand offers could be compared with the mood from the Cheap Oakley vault.

  • Pingback: Normalized database | Ztinfo

  • Kelvin Jones

    A fantastic article. I’m left with no questions about how it all fits together and which route suits my needs. Many thanks.

  • Pingback: Denormalizing graph-shaped data | Linked Data

  • Steve RD

    I found this article helpful, thank you. Therefore I made a reference in my blog to this article.

    http://soa-java.blogspot.nl/2013/01/database-guidelines-rdbmssql.html

  • http://www.kizi2.com/ kizi

    Very interesting info!Perfect just what I was searching for!

  • http://www.y8u.org/ Y8

    Well-formatted and written content like this affected ME.

  • http://yepi-games.kizifriv1.com/ Yepi Friv

    I selected to marker this web site therefore I will return and skim it once more.

  • http://www.y8friv.asia/ Friv 4

    I’ve browse several alternative articles on this subject solely to be
    frustrated within the content. you must be proud that you are
    influencing the opinions of others.

  • http://y8-kizi.kizifriv1.com/ kizi friv

    You have created some nice points here that ar stimulating and intriguing.

  • http://www.jugarfriv.org/ friv

    Thanks article. I think i neet it

  • http://www.kizi800.com/ kizi

    A fantastic article. I’m left with no questions about how it all fits together and which route suits my needs. Many thanks.

  • http://www.fri-v.com/ Friv

    Great site. A lot of useful information here. Thanks!!

  • http://www.friv3games.net/ friv

    Thanks for giving me the useful information. I think I need it. Thank you

    http://www.kizifriv.co.uk

  • Pingback: The denormalizing sweet spot | snarfed.org

  • HoangTung Ls

    Helpful information. Lucky me I found your website unintentionally,

    and I am shocked why this twist of fate didn’t took place earlier!

    http://www.kizi-2.co

  • Pingback: Build faster web apps with Denormalization

  • http://www.friv14.com/ friv
  • hung

    It is the game in your favorite hobby? Would you like to play friv jogos cell phone during your trip? Do you like playing online role playing games? Are you after the best and newest Intel video game? This article is for you! All you want to know can be found here

  • baongoc

    When you save the game, do more than save in the same slot. Occasionally choose a new one. The time may come when you want to go a little back in the game and try something different. This option is closed to you if you just saved the game in a groove until the end.

    http://www.friv300game.com