InnoDB is very different from MyISAM in how it stores data and how the indexes works. InnoDB uses an index-organized data storage technique, wherein the primary key acts as the clustered index and this clustered index holds the data. There is no separate storage of data, its stored within the clustered index in the leaf nodes.

Its for this reason that understanding the basics of InnoDB primary key is very important, and hence the need for these notes. The purpose of these notes is to help you with selecting the correct primary keys.

Now let’s start off!

Understanding the primary key

Following are a few things that you should know about the primary key:

  • If you don’t create a primary key, InnoDB will first try to use a unique non-null key if present. And if there is no such key present then InnoDB will by itself generate a hidden 6-byte integer primary key and cluster on that. This value may or may not be a feasible choice for your data access patterns.
  • With InnoDB, the columns making up the primary key are automatically appended to the secondary indexes. This has two side-effects:
    1. if you have a long primary key, then all of your secondary indexes will become long as well,
    2. the secondary indexes can act as covering indexes, for data retrievals that need secondary indexed column values as well as the primary key column value.
  • InnoDB does not compress indexes as MyISAM does, which means that you will need comparatively large amount of disk space.

Selecting the right primary key

Following are a few consequences from the above mentioned points about the primary key:

  • You should strive for a small primary key if you’ll have many indexes on a table. That’s going to save you disk space as well as will allow for much better caching.
  • The easiest way to select a primary key is to use an AUTO_INCREMENT column as the primary key. This will ensure that rows are inserted in sequential order and will allow for very efficient clustering.
  • Avoid using random non-sequential values as primary keys, because such type of values spread out the data making the index insertion random and do not give any helpful clustering.

I hope after reading this post you will have a better understanding of how to select the right primary key, and I hope this forms a part of your handy reference.