On Covering Indexes and Their Impact on Performance

The purpose of this post is to describe what covering indexes are and how they can be used to improve the performance of queries. People mostly use indexes to filter or sort the results but not much thought is given to actually reduce the disk reads by using proper indexes. So I will show you how to reduce disk reads and hence improve the performance of queries by utilizing indexes properly.

On a side note, all the examples in this post are based on the employees database which can be downloaded from here.

What is a Covering Index?

Covering index is the name given to an optimization where the user query is satisfied from the index alone without having to read the data from the data file.

Suppose you have a query

select emp_no from employees where emp_no between 1 and 20;

and you have an index on the column emp_no. Now the column accessed by the query emp_no is indexed and hence the index will contain values for that column, so there is no need to hit the data file to fetch the values, such an optimization is known as covering index.

Running an explain on the same query

explain select emp_no where employees where emp_no between 1 and 20\G

will output the following result

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 1
        Extra: Using where; Using index
1 row in set (0.00 sec)

If you have a look at the Extra column you can see the words using index, which means that MySQL was able to satisfy the query from the index alone, and hence it is an example of a covering index optimization.

So the important principle to remember is that all the columns in the sql statement must match the columns in the index, for the covering index optimization to be used.

Let’s understand how covering index impacts performance.

How Covering Index impacts performance?

To understand how covering index can improve performance, lets have a look at how index lookup is performed for queries that cannot be fulfilled from index alone.
When a query is executed a lookup is performed on the index starting from the root node and navigating through the branches to the correct leaf node. Now either the values at the leaf node itself can satisfy the query (in which case the index is a covering index) or a random I/O operation has to be performed to read the data from the disk. The number of read operations needed depend on the type of the index:

  • If the lookup is performed on a primary index or a unique index, then at most one random read is required, because primary/unique index value corresponds to a single row.
  • If the lookup is performed on a non-unique index, then up to N random reads may be required, where N is the number of rows that correspond to the index value.

Random reads are not great for performance and you always want to find ways to minimize them. Covering indexes are good at that, obviously because you don’t need to do any extra disk I/O to read the values as they are already their in the index.

You should keep in mind some important considerations with respect to indexes and InnoDB. Secondary indexes in InnoDB need two index lookups, so covering indexes are more important as they can save you 2xN random read operations. Also secondary indexes in InnoDB automatically include the primary index values, so in some scenarios they can automatically act as covering indexes.

Now enough of the theory and let’s see covering indexes in action.

Example

Suppose you want to find the employees hired in the first month of year 2000:

explain select emp_no from employees where hire_date between '2000-01-01' and '2000-01-28'\G

Following is the explain output

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 300141
        Extra: Using where
1 row in set (0.00 sec)

And the query takes 0.12 seconds.

Now let’s add an index on the hire_date column (and remember this is an InnoDB table so the emp_no column will automatically be part of the index),

alter table employees add index (hire_date);

Running the explain again now outputs the following:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
         type: range
possible_keys: hire_date
          key: hire_date
      key_len: 3
          ref: NULL
         rows: 13
        Extra: Using where; Using index
1 row in set (0.00 sec)

And now the query takes 0.00 seconds.

I hope you understood how covering indexes can help improve performance of queries, but make sure that you don’t add too many and unnecessary indexes, because indexes have an overhead more so on inert/update operations.

  • Ovaistairq

    Your covering indexes don’t include the column(s) being selected so they’re not actually covering the query. You’re only creating an index on the column in the where clause so there’s still key lookups and disk access required.

    A covering index should include all of the columns being queried and returned (hence ‘covering’ all columns being requested). Then there’ll be no disk access required.

  • http://www.ovaistariq.net/ Ovais Tariq

    The column emp_no does not need to be added to the index, since its a PK. As I have mentioned that I am running this query on a test database “employees” which is available here: https://launchpad.net/test-db/ so that you can check out the table structure.

blog comments powered by Disqus