Paging a Query

In the past few months I had several occasions to use the “rownum” pseudo-column to page a query result set. As we know, a query returns a result set, which the application gets and processes. In many cases we wish to show the result set to the user and when the result set is large we would want to show the user only a portion of it (ordered by some key or just a sample). This is often called “paging”. The idea is to show only the beginning of the result set (the first “page”) and allow the user to navigate to other pages to see the rest of the result set. In this case, if we use the application to take this portion of the result set, it would require the entire result set to flow from the database to the application, and then the application would show the relevant part and just ignore the rest; this is both difficult and inefficient.

The popular approach is to let the database handle the paging, we can write a query that sorts the result set and already returns only the relevant rows (for example, for 50-rows page, returns the first 50 rows for the first page, rows 51-100 for the second pages and so on). Until Oracle 12c, we had to use the “rownum” pseudo-column to do that, in 12c, we can do this more easily. But since many companies still use 11g or earlier versions, I will explain both ways.

before 12c

Using “rownum” to reduce the amount of rows in a result set is very common. There are some things, however, that we need to remember when using it, especially when we use it for paging.

First page (or the beginning of the result set)

Getting the first page (or the beginning of the result set) is simple. The “rownum” pseudo-column returns the number of the row in the set, so in order to get the first 10 rows we just need to add “rownum<=10” predicate to the query. An important thing to know in this case, is that when Oracle processes an SQL statement, it runs the “where” clause first, then it sorts the result. So if we wish to get the 5 employees that earn the highest salaries in the company the following query will be wrong:

 SQL&gt; select ename,sal from emp&lt;/pre&gt;
where rownum&lt;=5 order by sal desc; ENAME             SAL ---------- ---------- JONES            2975 ALLEN            1600 MARTIN           1250 WARD             1250 SMITH             800 

This is the correct one:

 SQL&gt; select *
     from (select ename,sal
           from emp order by sal desc)
     where rownum&lt;=5;
ENAME             SAL
---------- ----------
KING             5000
SCOTT            3000
FORD             3000
JONES            2975
BLAKE            2850

Other pages

Once we realize the way to use inline view for the first page, we may think that this is also the way to get the next 5 employees sorted by their salary:

SQL&gt; select *
     from (select ename,sal
           from emp order by sal desc)
     where rownum between 6 and 10;
no rows selected

The explanation here is that Oracle first applies the “where” clause and only then assign the rownum pseudo-column, meaning; only rows that are returned by the query are numbered by Oracle. Requesting rows starting with rownum=6 (in this example) will actually return nothing, as Oracle didn’t return rows that were numbered 1, 2…5, so it actually can’t have a row numbered 6.
If this is the case, we can use the following method (adding another subquery to assign rownum and then use it in the where clause):

SQL&gt; select *
     from (select ename,sal, rownum r
           from (select ename,sal
                 from emp order by sal desc
                )
          )
     where r between 6 and 10;
ENAME           SAL         R
---------- ---------- ----------
CLARK            2450          6
ALLEN            1600          7
TURNER           1500          8
MILLER           1300          9
WARD             1250         10

This is correct. But now we are facing another issue, which is the performance.

Performance Issues

To check performance, first let’s take a look at the plan of the last query, but we will use a much larger table (I created large_emp with 50000 employees with unique salaries for this):

SQL&gt; select *
     from (select ename,sal, rownum r
           from (select ename,sal
                 from large_emp order by sal desc
                )
          )
     where r between 6 and 10;
Execution Plan
----------------------------------------------------------
Plan hash value: 4064977742
-------------------------------------------------------------------------------------------
| Id  | Operation             | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |           | 50000 |  1611K|       |   323   (1)| 00:00:01 |
|*  1 |  VIEW                 |           | 50000 |  1611K|       |   323   (1)| 00:00:01 |
|   2 |   COUNT               |           |       |       |       |            |          |
|   3 |    VIEW               |           | 50000 |   976K|       |   323   (1)| 00:00:01 |
|   4 |     SORT ORDER BY     |           | 50000 |   537K|   992K|   323   (1)| 00:00:01 |
|   5 |      TABLE ACCESS FULL| LARGE_EMP | 50000 |   537K|       |   103   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(&quot;R&quot;&lt;=10 AND &quot;R&quot;&gt;=6)

As you can see, there is one filter on “r between 6 and 10”, but the optimizer thinks the query returns 50000 rows. That’s because it cannot know better as “r” is calculated during the query execution.
So the first performance issue is the estimated rows from the query. If this is a stand-alone query, it is not a big issue, but if this is a part of a large query with joins, this bad estimation can cause huge optimization problems.
In order to solve this, I will use rownum again:

SQL&gt; select *
     from (select ename,sal, rownum r
           from (select ename,sal
                 from large_emp order by sal desc
                )
           where rownum&lt;=10           )      where r&gt;=6;
Execution Plan
----------------------------------------------------------
Plan hash value: 76032826
----------------------------------------------------------------------------------------------
| Id  | Operation                | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |           |    10 |   330 |       |   323   (1)| 00:00:01 |
|*  1 |  VIEW                    |           |    10 |   330 |       |   323   (1)| 00:00:01 |
|*  2 |   COUNT STOPKEY          |           |       |       |       |            |          |
|   3 |    VIEW                  |           | 50000 |   976K|       |   323   (1)| 00:00:01 |
|*  4 |     SORT ORDER BY STOPKEY|           | 50000 |   537K|   992K|   323   (1)| 00:00:01 |
|   5 |      TABLE ACCESS FULL   | LARGE_EMP | 50000 |   537K|       |   103   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(&quot;R&quot;&gt;=6)
   2 - filter(ROWNUM&lt;=10)
   4 - filter(ROWNUM&lt;=10)

Here, I used “where rownum<=10” in the inline view instead of “r<=10” in the outer query. That way the sort will be first, then the predicate containing the “rownum=6”. Using this syntax, the optimizer knows it will return only 10 rows, even though it doesn’t know about returning only 5 of them. So this technique will work better, at least for the first few pages (after that, the number of rows the optimizer estimate will again be much higher than the actual number and can lead to performance issues). By the way, the first query did 374 buffer gets while the second one 373. This is not a big difference, but there might be a little performance gain in the second, as you can see that the sort operation itself uses “stopkey” and might use a more efficient algorithm. Another thing we need to consider is speeding up the query using indexes. To do that I’ll add an index to the sal column and change my queries not to return null values (which will eliminate the usage of the index).

 SQL&gt; select *
     from (select ename,sal, rownum r
           from (select ename,sal
                 from large_emp
                 where sal is not null
                 order by sal desc
                )
          )
     where r between 6 and 10;
Execution Plan
----------------------------------------------------------
Plan hash value: 4064977742
-------------------------------------------------------------------------------------------
| Id  | Operation             | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |           | 50000 |  1611K|       |   324   (1)| 00:00:01 |
|*  1 |  VIEW                 |           | 50000 |  1611K|       |   324   (1)| 00:00:01 |
|   2 |   COUNT               |           |       |       |       |            |          |
|   3 |    VIEW               |           | 50000 |   976K|       |   324   (1)| 00:00:01 |
|   4 |     SORT ORDER BY     |           | 50000 |   537K|   992K|   324   (1)| 00:00:01 |
|*  5 |      TABLE ACCESS FULL| LARGE_EMP | 50000 |   537K|       |   103   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(&quot;R&quot;&lt;=10 AND &quot;R&quot;&gt;=6)
   5 - filter(&quot;SAL&quot; IS NOT NULL)

We see the same plan, and the same 374 buffer gets. Oracle doesn’t use the index as it thinks it needs to return all 50000 rows.
The second query, on the other hand, looks like this:

SQL&gt; select *
     from (select ename,sal, rownum r
           from (select ename,sal
                 from large_emp
                 where sal is not null
                 order by sal desc
                )
           where rownum&lt;=10           )      where r&gt;=6;
Execution Plan
----------------------------------------------------------
Plan hash value: 3453464421
------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |               |    10 |   330 |     3   (0)| 00:00:01 |
|*  1 |  VIEW                          |               |    10 |   330 |     3   (0)| 00:00:01 |
|*  2 |   COUNT STOPKEY                |               |       |       |            |          |
|   3 |    VIEW                        |               |    10 |   200 |     3   (0)| 00:00:01 |
|   4 |     TABLE ACCESS BY INDEX ROWID| LARGE_EMP     | 50000 |   537K|     3   (0)| 00:00:01 |
|*  5 |      INDEX FULL SCAN DESCENDING| LARGE_EMP_SAL |    10 |       |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(&quot;R&quot;&gt;=6)
   2 - filter(ROWNUM&lt;=10)
   5 - filter(&quot;SAL&quot; IS NOT NULL)

This speaks for itself, Oracle uses the index, returns only 10 rows (and then take the last 5) and does only 5 buffer gets.

Oracle 12c

In 12c, Oracle added a new syntax for TOP-N queries. I’ve already started writing a post about this feature, so expect it in the next few days. Meanwhile, this is the same query with the new syntax:

SQL&gt; select ename,sal
     from large_emp
     where sal is not null
     order by sal desc
     offset 5 rows fetch next 5 rows only;

Summary

Paging in database queries is very important, as the application gets only the rows it needs. However, it is important that the database will understand as much as it can about the data returned from the query. This will affect the cardinality of this part of the query, as well as help the database decide whether or not to use indexes.

Tags: ,

1 thought on “Paging a Query”

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Post