MySQL Paginated displays – How to kill performance vs How to improve performance!

Pagination is used very frequently in many websites, be it search results or most popular posts they are seen everywhere. Typically you have a list of 10 to 20 results below which are pagination controls with either the page numbers listed or previous/next links present.

Let’s visualize by taking a look at how the big guns go about the pagination scenarios.

1. Google:

Google shows a list of page numbers but it doesn’t show the entire list of page numbers, neither does it show the exact count of the number of results, rather it estimates the number of results. The list of page numbers starts with 1 – 10 instead of the entire list of page numbers and it increases as a user digs deeper into the results, meaning on page 2 of the result you would see page numbers 1 – 11, on page 3 you would see 1 – 12 and so on.

2. Facebook:

Facebook has next previous button to navigate through the search results, each search result page having 10 results each, the other thing that Facebook is doing is its estimating the number of results and doesn’t give an exact count. There are no page numbers given, so that a user will only be able to the use the next previous button to navigate through the results.

3. Yahoo:

Yahoo shows a list of page numbers, 11 at a time to allow you to navigate through the search results, it doesn’t show you the entire list of page numbers. The page number are shown in such a way that if you are on pages 1 to 6 you would see page numbers 1 – 11 and after that the page numbers list would start from (current page number – 5) to (current page number + 5).

Analysis of the big guns:

Analyzing how Google, Facebook and Yahoo are going about handling pagination you would see that there are certain things that are common, which are:

  • None of them show the exact count of the number of results.
  • None of them show the entire list of page numbers, nor do they have controls such as ‘Goto Last Page’ or anything that would involve digging deep into the search results.
  • They are interested in showing the user things that are relevant to the user, because no one would want to have a look at the least relevant results (a list of 10 page numbers is a save bet, because thats what most of users would be interested in).

Now let’s have a look at a common scenario at how pagination is usually implemented.

Typical pagination implementation:

Typically pagination is implemented in the following steps:

  • Get the total number of records by either executing
    SELECT COUNT(*) FROM my_table WHERE x = y ORDER BY date_col;

    OR

    SELECT SQL_CALC_FOUND_ROWS * FROM my_table WHERE x = y ORDER BY date_col LIMIT 0, 10;
  • If you were using SELECT COUNT(*), then execute another query to fetch the results typically as follows:
    SELECT * FROM my_table WHERE x = y ORDER BY date_col LIMIT 0, 10;
  • Execute the above two steps for the remaining pages, the only thing to change would be to replace LIMIT 0, 10 with LIMIT 10, 10 LIMIT 20, 10 and so forth.

Performance implications of the typical pagination implementation:

The performance implications of such naive pagination implementation are frightening to say the least. Following is why I am saying so:

  • First of all if you go the two query route, executing a count query and then a result fetching query would involve some overhead.
  • If you don’t go the two query route and still decide to use SQL_CALC_FOUND_ROWS, then the performance implication is going to suffer more than if you had used count query, because MySQL would then not be able to use its limit optimization. Instead it will have to have a look at all the records that match because we are asking MySQL to provide us a count as well. If we hadn’t used SQL_CALC_FOUND_ROWS then MySQL would have just stopped processing after having found the number of rows that we requested.
  • If you are naive enough to have pagination links like Last Page or Goto Page 1000 or Goto Page 10000, then you are asking for more troubles. Suppose you asked for page number 10001 and you asked for 10 rows, then what MySQL would essentially be doing is it would fetch 10020 rows discard the first 10000 and return the 20 rows you requested. Image the overhead.

Having gone through analyzing what the big players like Google, Yahoo and Facebook are doing, and keeping in mind the performance implications of a typical pagination implementation, I would suggest a pretty straight forward solution.

My suggestion

I would suggest going the route of the big players. Following is how I would implement pagination:

  • Don’t show all the results no one does, and no one is interested in viewing all the results.
  • Go the Facebook route and display Next Page Previous Page buttons only
  • Get rid of the results counting story and instead use clever querying. Suppose you show 10 results to the users at a time, then all you have to do to know if you have to show the Next Page link is fetch one extra record that is 11 records and if the 11th record exist that means you can show the Next Page link.
    SELECT * FROM my_table WHERE x = y ORDER BY date_col LIMIT 0, 11;

Finally, I would say that although pagination seems trivial but it still has performance implications. Investigate and develop is the best way to design your application (slow query log is a big help).

  • Pingback: MySQL Paginated displays – How to kill performance vs How to … | mysql

  • Pingback: Tweets that mention MySQL Paginated displays – How to kill performance vs How to improve performance! | ovais.tariq -- Topsy.com

  • Pingback: MySQL Paginated displays – How to kill performance vs How to improve performance! « DbRunas – Noticias y Recursos sobre Bases de Datos

  • Pingback: MySQL Paginated displays – How to kill performance vs How to …

  • Pingback: Manager’s Success: Hire Best Available Talents | Effective Delegation

  • Pingback: How to Speed Up Windows Vista

  • Bill Gates

    I hate it how blog posts are increasingly being filled up with comments from arses who are just twitting about it.

  • Pingback: php programming basics | computerskeyboards.org

  • http://www.celerity.nl Tomas

    The last query using limit on a count query does not work (at least not in 5.0), it just counts all rows.

  • http://pulse.yahoo.com/_JCIFUHVEJYD6FU2A4ALVOCGE6M Doug

    I like it!!!

  • http://twitter.com/AndyJ Andy Jarrett

    deleted

  • Pingback: Hp Proliant Dl380 Best Server to Rent :: Dedicated Server Hosting Reviews

  • Pingback: Must Have Considerations When Selecting the Best Web Hosting Service for Wordpress Blogs :: Dedicated Server Hosting Reviews

  • http://twitter.com/vargheessamraj Varghees Samraj

    SELECT COUNT(*) FROM my_table WHERE x = y ORDER BY date_col LIMIT 501;
    and
    SELECT COUNT(*) FROM my_table WHERE x = y ORDER BY date_col;
    will not make any difference.
    it limits the number of rows in the result.

  • http://www.ovaistariq.net/ Ovais Tariq

    If someone is tweeting about a post, then I think we should accept that, cz they must be doing it when they like the post.

  • http://www.ovaistariq.net/ Ovais Tariq

    Yes of course we could use “SQL_CALC_FOUND_ROWS” but the performance overhead because of that as I have discussed in my post is too much to be turned a blind eye too.

  • Frank12

    Another way to paginate with a good performances improvement is to show the button “view more” like youtube does. you just show X results to benig and every click on “view more” append X more results.
    a lot of web site are doing this automatically when the page is scrolled (like google images)

  • Pingback: abcphp.com

  • Prateek

    Nice…
    It is quiet common now a days,that an application only fetch limited records and this is not confined to only these giants…

  • Pingback: A HostGator Web Hosting Service Review :: Dedicated Server Hosting Reviews

  • Pingback: Willow Smith on ‘Ellen’: ‘Whip My Hair’ but no whiplash - Updates From Around The World - Conglomerate Blogging

  • http://www.ovaistariq.net/ Ovais Tariq

    Yes of course that is a very good UI design pattern to use and there are many who are using it I agree, but the point is how do you implement it and that is what I intended on discussing in my post.

  • http://www.ovaistariq.net/ Ovais Tariq

    Thanks for your comment!

    I agree with what you say, but the thing is yes it is common nowadays to have application fetch only limited records, but most of the times I am going through slow-query logs or looking into application performance benchmarks, its the implementation that is not correct and the implementation part is what I have tried to cover in this post.

  • http://twitter.com/eRadical Gabriel PREDA

    I agree with all you said on the user interface side… most of the time you do not need that many options for pagination.
    But when you really need it…

    The MySQL manual at http://dev.mysql.com/doc/refman/5.1/en/select.html says:
    The HAVING clause is applied nearly last, just before items are sent to the client, with no optimization. (LIMIT is applied after HAVING.)

    So in our case, because we don’t “group by”, the last “filtering” applied to the result set is LIMIT… just before sending the result set to the client.
    Wow… this tells us that MySQL server with or without SQL_CALC_FOUND_ROWS flag will process all the rows in the result set and just before sending the result set to the client will cut acording to the LIMIT clause.

    So instead of going for “SELECT * + SELECT COUNT(*)” whitch BOTH HIT DATA (and maybe the disk)… I would definitely go for “SELECT SQL_CALC_FOUND_ROWS * + SELECT FOUND_ROWS()” witch will HIT DATA (and maybe the disk) ONLY ONCE because FOUND_ROWS() only returns a result computed/stored by the previous query.

    I’ve played a little on my home computer with the employees database and with various conditions (server restart, flush/reset reset query cache).
    The results are at http://gabriel.e-radical.ro/SQL_CALC_FOUND_ROWS-vs-COUNT.html

    Given the employees database for MySQL (https://launchpad.net/test-db/+download) can you prove your assertions with some sql queries about SQL_CALC_FOUND_ROWS vs COUNT(*) ?

  • http://www.ovaistariq.net/ Ovais Tariq

    Thanks for commenting!

    I would like to differ with what you have said. Reading your quote the keyword here is with no optimization, there are various ways that MySQL does query optimizations specifically since we are talking about LIMIT with ORDER BY, so we can have a look at how MySQL optimizes LIMIT:

    http://dev.mysql.com/doc/refman/5.0/en/limit-optimization.html

    “If you use LIMIT row_count with ORDER BY, MySQL ends the sorting as soon as it has found the first row_count lines rather than sorting the whole table.”
    In short: Using ORDER BY with LIMIT causes this type of optimization.

    And as far as saying that “SELECT COUNT(*)” is going to hit data, that would really depend on the table engine you are using and the indexing you have done, if you are using MyISAM, then there would be no need to hit data, as MySQL internally save a count of the number of rows.

    Now if you are filtering the data and then trying to count, or if you are using say InnoDb, then it would depend on how good an indexing you have done. Suppose you have built the right index and loaded it into memory, then count is done on the index in memory and is going to be lightening fast, since mysql won’t be hitting any data.

    So I would say that there are many different factors involved when talking about optimizations and the most important factor is indexing.

    Lastly I would love to post my findings on the sample data you have provided. Let me take out sometime from my schedule then I will post my findings in another post and would love to see you there.

  • http://www.ovaistariq.net/ Ovais Tariq

    And having a look at your query

    SELECT * FROM salaries
    WHERE from_date BETWEEN ’1996-01-01′ AND ’2000-12-31′
    ORDER BY from_date ASC LIMIT 100000, 10;

    I am against selecting such records where LIMIT ranges are these high “LIMIT 100000, 10″.
    I have clearly mentioned not selecting these kind of ranges, to quote from my post:

    “If you are naive enough to have pagination links like Last Page or Goto Page 1000 or Goto Page 10000, then you are asking for more troubles. Suppose you asked for page number 10001 and you asked for 10 rows, then what MySQL would essentially be doing is it would fetch 10020 rows discard the first 10000 and return the 20 rows you requested. Image the overhead.”

  • http://twitter.com/eRadical Gabriel PREDA

    The phrase you’re quoting from the manual continues with this:

    “If a filesort must be done, all rows that match the query without the LIMIT clause must be selected, and most or all of them must be sorted, before it can be ascertained that the first row_count rows have been found. In either case, after the initial rows have been found, there is no need to sort any remainder of the result set, and MySQL does not do so.”

    So the only thing that stopps is sorting… because it’s only logic that “all rows that match the query without the LIMIT clause must be selected”.

    You may be against selecting records where LIMIT ranges are high… you did clearly mentioned not selecting these kind of ranges… BUT it beats the purpose of my response/request. I only asked you for proof of your assertions…

    You asserted that “SELECT * + SELECT COUNT(*)” is better than “SELECT SQL_CALC_FOUND_ROWS * + SELECT FOUND_ROWS()” and what I wanted from you are numbers based on some SQL statements (like I’ve offered you).

    Lastly if what you’re saying applies to MyISAM only and or good-indexed query you should have mentioned that in the first place.

    But all in all.. I have given you a link to a database… and some SQLs with execution times reported by the mysql client.

    I’m expecting some SQLs and numbers…
    Imagine you’re in a math class, you tell the teacher the result but the teacher is stubborn and wants to see the whole demonstration… not just the final result!

  • http://www.ovaistariq.net/ Ovais Tariq

    Yup I am imagining myself in a Maths class :) Teacher, teacher can I have the demonstration ready by tomorrow :)

  • Pingback: World Spinner

  • http://twitter.com/vargheessamraj Varghees Samraj

    instead of SELECT COUNT(*) FROM my_table WHERE x = y ORDER BY date_col LIMIT 501;

    you should use

    SELECT 1 FROM my_table WHERE x = y ORDER BY date_col LIMIT 1 offset 500;

    if you get one record you have more than 500 records. If this query returns 0 records it can be treated as it has only lesser than 500 records

  • http://www.ovaistariq.net/ Ovais Tariq

    Nice suggestion, but still that would mean scanning 501 records, I am looking for a way to not to do any scanning on the table, instead just count out the values using an indexed column.

  • Pingback: What are your Biggest Challenges? | Mega Business Achievement Institute

  • http://www.facebook.com/magnus.lundberg Magnus Lundberg

    If you need to type number of hits, you are on the first page you can start with the “SELECT * … LIMIT 0,10″. If the result is less than pagesize, you dont need the “SELECT COUNT(*)”. Good trick if you know you have a lot of smaller sets.