This topic has been sitting in my backlog for a long time and I finally decided to write it. Analytic functions are not so new anymore (they’ve been around since Oracle 8i), but they are still a very powerful tool.
This is based on a real case I had quite a few years ago. A client came to me with a question regarding a graph they had to generate.
But before I start, I wanted to tell something funny. When I decided to write this blog, I sat down to reproduce the data and the queries (from my memory, which is not great at best). I was almost done when I realized that there actually is a better way of doing it (than the one I actually used in the original case). So in this post I’m not going to present the solution I provided back then, but a better and easier-to-understand solution.
And now to the case:
The system had a table of cars. Each row had information of the car, including the make and model. Their need was to present a bar graph of the number of cars by make and model in a certain way:
- First bar is the make with the highest number of cars
- Second bar is the make with the 2nd highest number of cars
- Third bar is the make with the 3nd highest number of cars
- Forth bar is all the rest
But that’s not all, in each bar, they wanted to have 4 colors: the bottom color is the model with the highest number of cars, the one on top of that is the 2nd model, then the 3rd model and on top of that all the rest. Here is an example:
To build this graph, they had 2 options:
- Get a full list from the database: make, model, and number of cars. Then calculate all the information they needed for the graph (this included: the total number of cars, the top 3 makes with number of cars per make, and the top 3 models for each of these makes with number of cars per model).
- Work in iterations: first run a SQL to find the top 3 makes. Then run another SQL for each of these makes to find the top 3 models of each make. And besides that, they also need to get the total number of cars as well as the number of cars per make for the top 3.
They didn’t want to go for option 1, but option 2 required lots of round trips to the database, which is not great. So at that point they came to me asking if I can somehow reduce the number of round trips to the database. After thinking about it a little bit, I told them that using analytic functions I can probably provide all the information they need with one query.
Test case
For this example, I created a simple table with some information about cars. Use this short script to create the table if you’d like to play with it
create table cars (id number, make varchar2(15), model varchar2(15)); declare id number; procedure insert_cars(make varchar2, model varchar2, num number) is begin for i in 1..num loop insert into cars values(id,make,model); id:=id+1; end loop; commit; end; begin id:=1; insert_cars('Toyota','Corolla',100); insert_cars('Toyota','Camry',52); insert_cars('Toyota','Yaris',36); insert_cars('Toyota','Rav4',30); insert_cars('Toyota','Highlander',30); insert_cars('Ford','Focus',85); insert_cars('Ford','Fusion',41); insert_cars('Ford','Edge',37); insert_cars('Ford','Explorer',30); insert_cars('Ford','Fiesta',30); insert_cars('Ford','Expedition',20); insert_cars('Honda','Civic',75); insert_cars('Honda','Accord',69); insert_cars('Honda','CRV',56); insert_cars('Honda','Pilot',20); insert_cars('GMC','Acadia',200); insert_cars('Dodge','Caravan',100); end; /
Phase 1
First, let see again what information we need:
- Number of cars per make (and rank them to get the top 3)
- Number of cars per make and model (and rank them to get the top 3, but we are interested only in the models of the top 3 makes)
- Total number of cars (in order to calculate the “others” bar size)
To get the number of cars per make and model, we can use the query below. The only “interesting” thing I did here is nested the analytic functions in the 4th column to sum the number of cars per make, all the rest is pretty straight forward:
select make, model, count(*) sum_model, sum(count(*)) over (partition by make) sum_make from cars group by make,model order by make,model;
The result set looks like that:
MAKE MODEL SUM_MODEL SUM_MAKE --------------- --------------- ---------- ---------- Dodge Caravan 100 100 Ford Edge 37 243 Ford Expedition 20 243 Ford Explorer 30 243 Ford Fiesta 30 243 Ford Focus 85 243 Ford Fusion 41 243 GMC Acadia 200 200 Honda Accord 69 220 Honda CRV 56 220 Honda Civic 75 220 Honda Pilot 20 220 Toyota Camry 52 248 Toyota Corolla 100 248 Toyota Highlander 30 248 Toyota Rav4 30 248 Toyota Yaris 36 248
Phase 2
The next thing is to add the total number of cars to the query. I used a “grouping sets” trick that I found here to do that. I added another column based on the grouping_id function so it will be clear which row is the grand total.
The last addition in this phase is a “rank” analytic function to find the top 3 models per make. I wanted to add the rank to find the top 3 makes, but that requires nesting another analytic function: rank(sum(count*)) and this is not allowed. I’ll add that rank in the next step.
select make, model, count(*) sum_model, sum(count(*)) over (partition by make) sum_make, grouping_id(make) is_grand_total, rank() over (partition by make order by count(*) desc) model_rank from cars group by grouping sets ((),(make,model)) order by make,model;
The result set:
MAKE MODEL SUM_MODEL SUM_MAKE IS_GRAND_TOTAL MODEL_RANK --------------- --------------- ---------- ---------- -------------- ---------- Dodge Caravan 100 100 0 1 Ford Edge 37 243 0 3 Ford Expedition 20 243 0 5 Ford Explorer 30 243 0 4 Ford Fiesta 30 243 0 4 Ford Focus 85 243 0 1 Ford Fusion 41 243 0 2 GMC Acadia 200 200 0 1 Honda Accord 69 220 0 2 Honda CRV 56 220 0 3 Honda Civic 75 220 0 1 Honda Pilot 20 220 0 4 Toyota Camry 52 248 0 2 Toyota Corolla 100 248 0 1 Toyota Highlander 30 248 0 4 Toyota Rav4 30 248 0 4 Toyota Yaris 36 248 0 3 1011 1011 1 1
Phase 3
The next step will be to add the rank for the makes. As I said I couldn’t do it in this query, so I simply converted it to an inline view and did the ranking in the outer query. Note that I added the grand_total to the rank order by clause, that’s because I don’t want the grand total to be first. The makes themselves should be first, so I can take ranks 1-3.
select make, model, sum_model, sum_make, is_grand_total, model_rank, dense_rank() over (order by is_grand_total,sum_make desc) make_rank from (select make, model, count(*) sum_model, sum(count(*)) over (partition by make) sum_make, grouping_id(make) is_grand_total, rank() over (partition by make order by count(*) desc) model_rank from cars group by grouping sets ((),(make,model)) order by make,model );
Result set:
MAKE MODEL SUM_MODEL SUM_MAKE IS_GRAND_TOTAL MODEL_RANK MAKE_RANK --------------- --------------- ---------- ---------- -------------- ---------- ---------- Toyota Camry 52 248 0 2 1 Toyota Corolla 100 248 0 1 1 Toyota Highlander 30 248 0 4 1 Toyota Yaris 36 248 0 3 1 Toyota Rav4 30 248 0 4 1 Ford Fusion 41 243 0 2 2 Ford Focus 85 243 0 1 2 Ford Fiesta 30 243 0 4 2 Ford Explorer 30 243 0 4 2 Ford Expedition 20 243 0 6 2 Ford Edge 37 243 0 3 2 Honda Pilot 20 220 0 4 3 Honda CRV 56 220 0 3 3 Honda Accord 69 220 0 2 3 Honda Civic 75 220 0 1 3 GMC Acadia 200 200 0 1 4 Dodge Caravan 100 100 0 1 5 1011 1011 1 1 6
Phase 4
Finally, the last step will filter only the rows that are important for us: the grand total and the 3 top models of the 3 top makes:
select * from (select make, model, sum_model, sum_make, is_grand_total, model_rank, dense_rank() over (order by is_grand_total,sum_make desc) make_rank from (select make, model, count(*) sum_model, sum(count(*)) over (partition by make) sum_make, grouping_id(make) is_grand_total, rank() over (partition by make order by count(*) desc) model_rank from cars group by grouping sets ((),(make,model)) ) ) where is_grand_total=1 or (model_rank<=3 and make_rank<=3) order by make_rank,model_rank;
And the final result set:
MAKE MODEL SUM_MODEL SUM_MAKE IS_GRAND_TOTAL MODEL_RANK MAKE_RANK --------------- --------------- ---------- ---------- -------------- ---------- ---------- Toyota Corolla 100 248 0 1 1 Toyota Camry 52 248 0 2 1 Toyota Yaris 36 248 0 3 1 Ford Focus 85 243 0 1 2 Ford Fusion 41 243 0 2 2 Ford Edge 37 243 0 3 2 Honda Civic 75 220 0 1 3 Honda Accord 69 220 0 2 3 Honda CRV 56 220 0 3 3 1011 1011 1 1 6
From this result set it’s quite easy for the application to build the required graph. The first 3 rows are the top make and the top 3 models for this maker. The “others” part of the bar can be easily calculated by subtracting the three SUM_MODEL from the SUM_MAKE. The same should be done for the other 2 makes. The last bar for “others” will be calculated from the grand total (the row where IS_GRAND_TOTAL=1) minus the 3 SUM_MAKE for the top 3 makes.