Friday, April 20, 2012

How Covering Indexes work in MySQL

I believe that MySQL has many cool and advanced index types which provide the best performance strategies (if used properly) compared to other RDBMS's. One such feature is called Covering Indexes.

The concept behind covering indexes revolves behind the fact that in MySQL the indexes do not contain just the pointers to the data pages on disk but also the actual data! So an index with keys (id, name) actually would actually contain both columns (id, name) inside the B-Tree leaf nodes.

This means that if a regular index is used in a certain manner, we can get the data directly from the index - such index would be called a covering index. Data retrieval from a covering index is extremely fast!

For the purpose of this discussion I will be using the following table:
CREATE TABLE t1(
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  user_name VARCHAR(20) NOT NULL,
  first_name VARCHAR(30) NOT NULL,
  last_name VARCHAR(30) NOT NULL,
  external_id INT UNSIGNED NOT NULL,
  country_id SMALLINT UNSIGNED NOT NULL,
  PRIMARY KEY(id)
) ENGINE=InnoDB;

, and an index called external_id is added as follows:

ALTER TABLE t1 ADD INDEX (external_id, first_name);

Now just suppose that we have the following query:
select first_name from t1 where external_id=1;

MySQL starts by looking at the filter key(s) and tries to find an index whose leftmost key is in this set. If it finds an index with such a property, MySQL will check the next key in the filter set and tries to match it with the previous index next key, so on and so forth - I am highlighting this to emphasize that MySQL uses the index keys from left to right. I will explain this in more detail in a future article. Back to our example: external_id is in the leftmost position of the index, so this index will be definitely used. On top of that MySQL realizes that the column data to retrieve is also found in the index so it doesn't need to grab it from the data pages. In other words we are lucky because we are using the index as a Covering Index!

Let's check how this is explained in the plan's output:
mysql> explain select first_name from t1 where external_id=1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ref
possible_keys: external_id
          key: external_id
      key_len: 4
          ref: const
         rows: 1
        Extra: Using where; Using index
1 row in set (0.00 sec)
The Using Index value in the Extra attribute means that MySQL is making use of a Covering Index. The (access) Type attribute tells us that MySQL is using a Ref access to get the data from the covering index i.e. it is using a fast index scan. The Key external_id is the name of the covering index. In other words, we are not only retrieving data directly from the covering index, but we're also accessing it through a fast index scan.

But what happens if the query now becomes as following:
select select external_id from t1 where first_name='James';

Although both external_id and first_name are found in the index keys, we are not referencing the index correctly. First of all MySQL tries to find an index whose leftmost key is first_name and it will not find any. So rest assured that we will not see any fast index scan access with this query. However MySQL smartly realizes that both columns used in this query are located in the covering index. Thus it would be more efficient to get these values from the B-Tree leaf nodes (which are probably already in memory, and smaller than the actual table) rather than looking them at the table. So in summary here is what happens:
  1. MySQL looks at the data in the B-Tree (Covering) Index and not the data pages of the table
  2. MySQL does a sequential read rather than an index scan on the index structure
Although the lookup is not as efficient as before as the access is sequential, this lookup is still better than a full table scan. This is depicted by the explain plan below:

mysql> explain select external_id from t1 where first_name='James'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: index
possible_keys: NULL
          key: external_id
      key_len: 36
          ref: NULL
         rows: 1051
        Extra: Using where; Using index
1 row in set (0.00 sec)
The Using Index value in the Extra attribute means that MySQL is making use of a Covering Index. The (access) Type attribute value of Index, which by the way is a bit of a misnomer, tells us that MySQL is doing a full scan of the index tree. The Key external_id shows the name of the covering index. As already stated, this is usually better than a full table scan because the size of the index is usually smaller than the raw data
Post a Comment