A few notes on InnoDB PRIMARY KEY

InnoDB is very different from MyISAM in how it stores data and how the indexes works. InnoDB uses an index-organized data storage technique, wherein the primary key acts as the clustered index and this clustered index holds the data. There is no separate storage of data, its stored within the clustered index in the leaf nodes.

Its for this reason that understanding the basics of InnoDB primary key is very important, and hence the need for these notes. The purpose of these notes is to help you with selecting the correct primary keys.

Now let’s start off!

Understanding the primary key

Following are a few things that you should know about the primary key:

  • If you don’t create a primary key, InnoDB will first try to use a unique non-null key if present. And if there is no such key present then InnoDB will by itself generate a hidden 6-byte integer primary key and cluster on that. This value may or may not be a feasible choice for your data access patterns.
  • With InnoDB, the columns making up the primary key are automatically appended to the secondary indexes. This has two side-effects:
    1. if you have a long primary key, then all of your secondary indexes will become long as well,
    2. the secondary indexes can act as covering indexes, for data retrievals that need secondary indexed column values as well as the primary key column value.
  • InnoDB does not compress indexes as MyISAM does, which means that you will need comparatively large amount of disk space.

Selecting the right primary key

Following are a few consequences from the above mentioned points about the primary key:

  • You should strive for a small primary key if you’ll have many indexes on a table. That’s going to save you disk space as well as will allow for much better caching.
  • The easiest way to select a primary key is to use an AUTO_INCREMENT column as the primary key. This will ensure that rows are inserted in sequential order and will allow for very efficient clustering.
  • Avoid using random non-sequential values as primary keys, because such type of values spread out the data making the index insertion random and do not give any helpful clustering.

I hope after reading this post you will have a better understanding of how to select the right primary key, and I hope this forms a part of your handy reference.

Ovais is a storage architect with keen focus on reliability, efficiency and performance of OLTP databases, more specifically MySQL. He currently works at Uber on ensuring storage reliability and scalability. Previously, he helped different web properties scale their MySQL infrastructure. He also worked at Percona as a Senior MySQL consultant and at a few other startups in various capacities.

  • SomeGuyNamedPaul

    …and no using a varchar for your primary key won’t save you any space at all. The full possible width of the column is what’s stored in the PK as well as the secondaries.

  • Pingback: Log Buffer #217, A Carnival of the Vanities for DBAs | The Pythian Blog()

  • Well mostly char is a better choice instead of varchar, if you have a lots of updates happening and you only have a few very large string values,. also it helps to avoid from fragmentation,.

    I would love to have you explain varchar’s effect on primary key,.

  • Also, although I have mentioned that InnoDB does not compress indexes, but the newer version of InnoDB plugin,. compresses both the data and the indexes.,

  • nav

    Yeah thats correct Ovais.. as i was going to mention it.. not only the innodb pluggin the newer version of MySQL 5.5 which has innodb pluggin built in as a default engine also having the same feature.


  • Yeah that’s a great feature but still I haven’t seen any benchmarks proving the performance benefits of compressed InnoDB tables.,

  • nav
  • Thanks for the links., but I guess I will have to have a more thorough look at the compression myself,.

  • nav

    No problem.. look at it by yourself and update it here too, i would be interesdted in output you get.. 🙂


  • with my experience numeric primary keys are best performer than non numeric and easy to handle search and sort at both database and application level.

    10 basic mysql commands to remember

  • sure

  • yes you are right, numeric primary keys are almost always better for following reasons:
    – they consume less space
    – they are fast
    – they work with AUTO_INCREMENT which ensures that rows are inserted in sequential order

  • Pingback: MySQL InnoDB의 primary key 특성 « turtle9()

  • Hsinghjatt

     confusion to take long primary key which one is adoptable such as  feedback_detail_id  and create the primary key on this filed is this is long primary key or  if we create primary on many field suchas (emp_id,name,emp_code)is this long which consider ???????????

  • Long primary key means a primary key on many fields, read my post here to understand why that would impact performance.

  • dharmendra patel

    thanks for sharing this article… please allow me to share a useful article of mine on InnoDB Disabled problem


  • gopal

    Hello Tariq,

    Thanks for nice article. I have following question:

    Saving in the space due to short primary key is reflected in ‘SHOW TABLE STATUS’ or not?

    E.g I have a table with following structure:
    `Datafrom` date NOT NULL DEFAULT ‘0000-00-00’,
    `Variable` mediumint(8) unsigned NOT NULL DEFAULT ‘0’,
    `Value` decimal(10,2) NOT NULL DEFAULT ‘0.00’,
    `Company` enum(‘x’,’y’) COLLATE latin1_german1_ci NOT NULL DEFAULT ‘GmbH’,
    `Language` mediumint(8) unsigned NOT NULL DEFAULT ‘0’,
    `Country` smallint(5) unsigned NOT NULL DEFAULT ‘0’,
    `account_status` enum(‘a’,’b’,’c’) COLLATE latin1_german1_ci NOT NULL DEFAULT ‘active’,
    UNIQUE KEY `Datafrom` (`Datafrom`,`Variable`,`Company`,`Language`,`Country`,`account_status`),
    KEY `Inserted` (`Datafrom`),
    KEY `Id` (`Variable`)
    ) ENGINE=InnoDB

    and I have changed by adding a ‘auto increment’ primary key:
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `Datafrom` date NOT NULL DEFAULT ‘0000-00-00’,
    `Variable` mediumint(8) unsigned NOT NULL DEFAULT ‘0’,
    `Value` decimal(10,2) NOT NULL DEFAULT ‘0.00’,
    `Company` enum(‘x’,’y’) COLLATE latin1_german1_ci NOT NULL DEFAULT ‘GmbH’,
    `Language` mediumint(8) unsigned NOT NULL DEFAULT ‘0’,
    `Country` smallint(5) unsigned NOT NULL DEFAULT ‘0’,
    `account_status` enum(‘a’,’b’,’c’) COLLATE latin1_german1_ci NOT NULL DEFAULT ‘active’,
    PRIMARY KEY (`id`),
    KEY `Inserted` (`Datafrom`),
    KEY `Id` (`Variable`),
    UNIQUE KEY `Datafrom` (`Datafrom`,`Variable`,`Company`,`Language`,`Country`,`account_status`)
    ) ENGINE=InnoDB

    This table has more than 16 million records.

    ‘Show Table Status’ gives following result:
    Rows: 16397283
    Avg_row_length: 39
    Data_length: 645922816
    Max_data_length: 0
    Index_length: 640614400

    Rows: 16396993
    Avg_row_length: 43
    Data_length: 716177408
    Max_data_length: 0
    Index_length: 818479104

    Here we can see that Index size is increased considerably.

    In such a scenario, is it OK to change the primary key or not?

    Thanks in advance,

  • In the first table definition you do not have an explicit primary key defined which means that InnoDB should use the first non-nullable unique key available as the Primary Key or an implicit 6-byte key. Since you do have a unique key which is created on non-nullable columns so InnoDB should just use that.
    There are couple of considerations with respect to primary key, first of all you should make sure that the primary key is defined in such as way that inserts are in increasing order otherwise writes will suffer. Secondly keep the primary key small, because primary key columns get appended to all the secondary keys. So by adding the new primary key column “id” it will definitely give you the advantage of improved write performance and smaller index sizes.

    However, the SHOW TABLE STATUS does not reflect that in the output that you have shared. Please note that table statistics reported are only approximate and can vary. Other then you might be hitting some bug that is reporting wrong numbers or isn’t selecting the unique key as the primary key. In any case its a bad idea to have an implicit primary key.

  • gopal

    Hello Tariq,

    As per my further analysis, SHOW TABLE STATUS output was correct, as same was reflected in dump size of 2 tables.

    Another thing I did to verify the point by creating additional indexes in both the table.

    alter table ABC add key (country), add key (company), add key (language), add key (account_status);

    after adding additional indexes difference in Index size of the 2 versions of the table is as expected.

    # Without Explicit Primary Key
    Index_length: 1932361728

    # with Auto Increment Primary Key
    Index_length: 1612201984

    As per my this analysis, it really depends on how many secondary indexes you have in a table, and performance should be measured accordingly.

  • Well efficient space utilization does indeed depend on the number of secondary indexes. We cannot simply measure performance by the number of secondary indexes. As I have explained in the post its important for a primary key to be defined in such a way that writes are always ordered by the primary key, and such that you are not writing in the primary key in random places. This really impacts performance a lot. That is why in your case clearly a surrogate primary key created on an auto incremented column is much better.
    Of course performance also depends on the number of secondary indexes, but as long as the secondary indexes are non-unique InnoDB can offset the performance problem by buffering changes to the secondary index in memory.

  • One thing worth noting (for anyone stumbling across this article) which was a massive ‘gotcha!’ for me:

    Be careful if you choose to use AUTO_INCREMENT as your primary key on an InnoDB table! The latest value is never written to disk, so even if you simply DELETE all rows in the table, when you reboot the server, your IDs will start to be duplicated.

  • Simon, if MySQL server is restarted you are right last AUTO_INCREMENT value would not be persisted and will be recalculated. But it is recalculated by taking the MAX value of the auto_increment column and starting from the next one. This will ensure that the IDs already in the table will not be duplicated.

  • Ovais, yes that’s true. I should’ve qualified my comment further:

    This *only* occurs if you DELETE rows from the table and _then_ reboot the server.

    The gotcha for me was that I was totally unaware of this behaviour. I only discovered it when researching an issue we were having with a client’s database.

    In this case, they needed to empty an InnoDB table on a regular basis exporting the data but maintaining record key uniqueness.

    Because the auto-incrementing ID column (which happened to be the primary key and a basis for a one-to-many relationship) was resetting, I assumed that they were wrongly performing a TRUNCATE. However, it turns out that they _were_ doing a DELETE as instructed, but also incidentally had decided to allow this particular server to reboot itself as necessary following automatic Windows (yep…) updates.

    And it turns out this will effect similar behaviour to a TRUNCATE.

    Strange set of circumstances + unexpected behaviour = me being blamed for weird problem, me finding real cause of problem, me providing a fix.

    The solution: if you need an always-unique ID, I recommend using GUID/UUID for your primary key on InnoDB tables.

    Yes the data cost will higher (your table and indexes will be bigger), but you can maintain referential uniqueness beyond the InnoDB storage engine.

  • Simon, true that this is a special InnoDB behavior due to the reason that InnoDB does not persist the auto_increment value.
    As for using UUIDs as primary keys, I generally discourage people for two main reasons:
    – Performance suffers specially when the data set becomes disk IO bound. It is recommended to have a monotonously increasing primary key values as that provides the best page fill ratio and keeps the B+Tree compact. The more compact the B+Tree, the more efficient index operations are and the smaller the height of the B+Tree
    – Disk usage is inefficient when using random set of values for primary key columns. B+Tree data structure is known to degrade in performance as it becomes more and more disk IO bound. The more inefficient the disk usage, the slower index access will be.

  • I agree with both of your concerns, but in this case it was a requirement by the client. As the data is cleared daily with no fear of more than a few hundred records ever being stored at any one time, the inefficiencies will be hardly noticeable. In fact, in this case, although indexes are created, they will hardly be used.

  • For small dataset you are right it won’t make that big of a difference.