My previous post that showed a real life troubleshooting was a hit. I guess people like to hear these kind of things, so I decided to write another post, this time a real life SQL tuning.
This happened a while ago, probably about 8-9 years, but it’s an excellent example and it contains many aspects of our role as DBAs. This client was a software company, and the specific department was developing some sort of monitoring system for its appliances. They didn’t have a DBA, so one of the developers took ownership of the database. Because they didn’t have a DBA they’ve made several decisions which I think were wrong (and I’ll explain here), but they were very open-minded when they asked for my assistance. I sat with the developer for about 2-3 days in total, and we really worked as a team to solve this (unlike some other cases when I felt like I’m fighting the developers or the client’s DBAs).
The design
The main issue was a specific query that took so long that the application reached a timeout (the timeout was about 3 or 4 minutes). In order to explain the query I need to explain a little bit about the environment first.
As I said, the product was a monitoring system, and it got status updates from components on the network. They needed to keep all the updates, but also have a “dashboard” with the last 300 statuses. A good solution for that is to use partitioning, but they didn’t have license for that, so they created an application process. They needed the last 300 statuses quickly so they decided to create 2 tables, one for the last status (let’s call it current_status) and one for history (let’s call it status_history). Then they created a process to move rows from the current_status table to the status_history table. In order to keep the current_status table small, they decided to move the rows according to the number of rows per appliance. The limit number of rows was 100, so if there are 100 appliances, there will be 10K rows (100*100) in current_status table. When an appliance sends more status updates, it will have more than 100 rows in current_status table, so the process will move old rows to the status_history table.
Managing this “archive” process by number of rows seemed to me not efficient and problematic, but that was one thing they didn’t allow me to touch.
Regarding the table structures, they had appliance_id and status_date columns and several other columns for data. They had to keep the exact order of the rows, so in order to verify that, they added another column called seq, that got its value from a sequence. Ordering by seq will ensure the correct order of statuses, even if they were received at the same second.
The primary key for the tables was on 3 columns: status_date, seq and appliance_id.
The problem
Now we can get to the problem. For the dashboard, they needed to get the last 300 statuses (according to filters the user might have). so they did the following:
select * from (select * from current_status where <filter> order by seq desc ) where rownum<=300;
Then they looked at the results, if they got 300 results, great. If not, they did the same against the status_history table. And this was the problem. If they needed to query the status_history table, the query took too long and the application reached the timeout.
A bug
A few years ago a client sent me a query that was slow and asked me to tune it. It was about 2 pages long and had almost 20 tables. When I asked for explanation on what the query does, he simply replied “what does it matter what is does? just tune it”. It doesn’t work that way. If we don’t understand what it does, our ability to fix and tune issues is very limited.
It wasn’t different here, because as I talked to them and understood the structure and query, I realized that they have a bug. Can you see it?
Remember that they are archiving rows according to the number of rows per appliance? So what happens if all appliances are quite, but one is sending statuses all the time? For simplicity, let’s say we have 10 appliances (id 1 to 10). Appliances 1 to 9 sent their last status at 10am and since then didn’t send anything. Appliance 10 on the other hand, sent 500 statuses since 10am and it’s now 10:30. So if I run the dashboard without filters, the above query will return 100 rows of appliance 10 and another 200 rows of other appliances. Because the query returned 300 rows, the application displayed the dashboard and that’s it. But that’s not correct. The actual data should have been 300 rows of appliance 10, because all 500 rows of appliance 10 are newer than any other appliance.
Fixing the bug
In order to fix the bug, we had to add the status_history table every time. So I wrote the following query for that
select * from (select * from (select * from current_status union all select * from status_history ) where <filter> order by seq desc ) where rownum<=300;
That’s great, no bugs, but it didn’t finish as well, because it got timeout every time. Now we need to work on the performance.
Performance
I don’t have all the optimizer plans with me, but since it’s a short and simple query, I can explain the process without them.
First, note that I used the “union all” instead of “union”. The “union” operator requires sort to eliminate duplicates, but the application makes sure that there will be no duplicates, so “union all” will give the same result, and won’t require sort.
Before the bug fix I saw that this query didn’t use indexes. If there was a filter, it used the relevant index, but without filters it didn’t. The primary key was status_date, seq and appliance_id, while the query used seq for sorting, which was the 2nd column in the index. When I thought about that, the date here was irrelevant. It might be relevant for filtering, but it wasn’t unique and it wasn’t part of the sort. So I just dropped it from the primary key. Now, seq was the first column in the primary key and the index. Oracle decided to use the index to scan the first 300 index entries which were sorted already instead of reading the entire table and sort it. But again, that was before the bug fix.
Now, after the bug fix, Oracle can’t use the index, and it still takes a long time to execute. So let’s think about the query. Oracle takes the current_status table, add the status_history table to it, filters the entire output, then sorts the entire output and then takes the top 300 rows. The union prevents Oracle from using indexes for the sort (it might use indexes for the filter with predicate pushing). What we need to do is to minimize the sort operation, and we can do that by reducing the number of rows to sort.
In any case the query returns 300 rows, in the theoretically extreme cases we can get all 300 rows from one table or the other. Regardless the data, we will never see the 301st row of any table, so why do we need to query it from the table? How about taking the top 300 rows from each table and only then combine only these 600 rows?
This is how the new (and final) query looked like
select * from (select * from (select * from (select * from current_status where <filter> order by seq desc ) where rownum<=300 union all select * from (select * from status_history where <filter> order by seq desc ) where rownum<=300 ) order by seq desc ) where rownum<=300;
This is a much longer and complex query, it filters and sorts each table separately, then takes only the top 300 rows. Only when we have 300 rows from each table we combine them using the “union all” operator, sort again (only 600 rows this time) and take the top 300 rows. Using this method, Oracle can use the primary key for the inner sorts (or other indexes for the filter), eliminating the need for full table scan.
The result: the query (and it doesn’t matter what were the filters if any), finished in a few seconds.
Summary
I like this example, as it demonstrate how, with understanding of the query and the database structure, we can not only improve performance, but find bugs we well. I hope you enjoyed the example, I will try to keep them coming once in a while.
Your query in the “Fixing the bug” section has
(select * from current_status
union all
select * from current_status
but I think you mean
(select * from current_status
union all
select * from status_history
Of course, thanks for noticing.
I Fixed it.
Wonderful example
Thanx.