I debated quite a lot before writing this post. When I wrote the post about interviewing a DBA, in the “technical questions I do ask” part I just gave a general explanation of what I ask, but didn’t reveal the real questions. Now, more than 3 years later, I decided to give one of the questions as a challenge here.
This is not a difficult question or a complex scenario, but the challenge here is to provide a few ways to solve it using SQL only. So here it is:
You have an employee table with employee name, department, salary, etc. (you can use the hr.employees table). The result set I’m looking for is a list containing the top earning employee per department. For simplicity, let’s assume that for each department there is only one employee with the highest salary and also there are no employees without departments (while sh employee meets the first assumption, there are employees without departments, so you can ignore these rows). Also, don’t worry about the lookups, department_id is good enough as I’m looking for the logic and not the details.
If you use the hr.employee table, the expected result set should be something like this:
DEPARTMENT_ID EMPLOYEE_ID SALARY
------------- ----------- ----------
10 200 4400
20 201 13000
30 114 11000
40 203 6500
50 121 8200
60 103 9000
70 204 10000
80 145 14000
90 100 24000
100 108 12008
110 205 12008
So far I know of 5 ways to write this query. A different way is not similar SQL with changes in the SQL text, but SQL with a really different approach.
Good luck.
Update – the solution post is published, you can see it here.
Using ‘set tab off’ in sqlplus will help in lining up output in your blog.
Here’s one way to do it.
dunno if HTML tags will work…
Yes Jared, this is one option, you used an analytic function here.
There are (at least) 4 other options that don’t use analytic functions, but some other techniques.
Thanks for posting this
Hi Liron, this is my solution to the challenge:
select department_id, employee_id, salary
from hr.employees
where department_id is not null
and (department_id,salary) in (select department_id, max(salary) from hr.employees group by department_id)
order by department_id
DEPARTMENT_ID EMPLOYEE_ID SALARY
10 200 4400
20 201 13000
30 114 11000
40 203 6500
50 121 8200
60 103 9000
70 204 10000
80 145 14000
90 100 24000
100 108 12008
110 205 12008
–sorry for the output not being formatted properly.
Diana
Great Diana, thanks.
That’s another option, using (what I call, I’m not sure this is the right term) a composite predicate.
3 to go!
There are some consequences to using that method. Note that dual full table scans:
This is true, some of the solutions I’ve got perform better than others. I actually have one that performs even worse than this one, but it still works.
I’d have thought the question would have performance aspects as a follow up challenge.
SELECT department_id, employee_id, salary
FROM ( SELECT department_id
,employee_id
,salary
,MAX (salary) OVER (PARTITION BY department_id) max_salary
FROM hr.employees
WHERE department_id IS NOT NULL
ORDER BY 1)
WHERE salary = max_salary;
Thanks for the reply, it’s actually another way to use analytic functions, and Jared already did that.
So still 3 to go!
The aggregate query doesn’t necessarily has to have a subquery:
select department_id,
max(employee_id) keep(dense_rank first ordre by salary desc) as employee_id,
max(salary) as salary
from hr.employees
where department_id is not null
group by department_id
order by department_id
And there are two other possibilities (which are similar to each other) with semijoin/antijoin
select department_id, employee_id, salary
from hr.employees e
where not exists (
select null
from hr.employees ee where e.department_id = ee.department_id and e.salary < ee.salary)
and department_id is not null
order by department_id
select left_e.department_id, left_e.employee_id, left_e.salary
from hr.employees left_e left outer join hr.employees right_e
on (left_e.department_id = right_e.department_id and left_e.salary < right_e.salary)
where right_e.salary is null
and left_e.department_id is not null
order by 1
Regards
Maxim
I have some solutions. I’ll contribute one (for now):
SELECT department_id,
MAX(employee_id) keep(dense_rank LAST ORDER BY salary NULLS FIRST) employee_id,
MAX(salary)
FROM hr.employees
WHERE department_id IS NOT NULL
GROUP BY department_id
ORDER BY department_id;
How about this ?
select department_id,employee_id,salary
from hr.employees emp
where department_id is not null
and not exists
( select 1
from hr.employees
where department_id = emp.department_id
and salary > emp.salary )
order by department_id;
DEPARTMENT_ID EMPLOYEE_ID SALARY
10 200 4400
20 201 13000
30 114 11000
40 203 6500
50 121 8200
60 103 9000
70 204 10000
80 145 14000
90 100 24000
100 108 12008
110 205 12008
11 rows selected.
Oracle Live SQL has used to execute above query.
using Cube, Grouping and Rollup?
Been waiting to see if someone would try those. Grouping Sets anyone?
OK, here’s one that uses something different. Far from perfect, but does use grouping and cube (both available since Oracle 8i I believe)
Hi Liron,
Here are 3 more solutions:
The first one just continues one of the ideas above, using GROUPING_ID:
with data as (
select department_id,
employee_id,
grouping(employee_id) g_empid,
max(salary) max_sal
from hr.employees
where department_id is not null
group by rollup(department_id, employee_id)
)
select department_id,
max(employee_id) employee_id,
max_sal salary
from data
group by department_id, max_sal
having count(distinct g_empid) = 2
order by department_id
/
DEPARTMENT_ID EMPLOYEE_ID SALARY
————————————–
10 200 4400
20 201 13000
30 114 11000
40 203 6500
50 121 8200
60 103 9000
70 204 10000
80 145 14000
90 100 24000
100 108 12008
110 205 12008
11 rows selected.
In the above solution posted by @JKSTILL the idea of using LAG is nice, but the result is not correct,
as the ordering inside the LAG function is by employee_id, not by salary.
Following are two more solutions essentially using aggregates, but in a different way than the above ones:
select department_id,
regexp_substr(
listagg(employee_id,’@’ on overflow truncate) within group(order by salary desc)
, ‘^[^@]+’) employee_id,
regexp_substr( listagg(salary,’@’ on overflow truncate) within group(order by salary desc)
, ‘^[^@]+’) salary
from hr.employees
where department_id is not null
group by department_id
order by department_id
/
DEPARTMENT_ID EMPLOYEE_ID SALARY
————————————–
10 200 4400
20 201 13000
30 114 11000
40 203 6500
50 121 8200
60 103 9000
70 204 10000
80 145 14000
90 100 24000
100 108 12008
110 205 12008
11 rows selected.
select department_id,
json_value(json_arrayagg(employee_id order by salary desc returning clob),’$[0]’) employee_id,
json_value(json_arrayagg(salary order by salary desc returning clob),’$[0]’) salary
from hr.employees
where department_id is not null
group by department_id
order by department_id
/
DEPARTMENT_ID EMPLOYEE_ID SALARY
————————————–
10 200 4400
20 201 13000
30 114 11000
40 203 6500
50 121 8200
60 103 9000
70 204 10000
80 145 14000
90 100 24000
100 108 12008
110 205 12008
11 rows selected.
Chapeau to Oren, for the nice MATCH_RECOGNIZE solution 🙂
Cheers & Best Regards,
Iudith
Nice Catch. In my haste to use something different I only checked the salaries and not the employee_id.
After playing around with it a bit I decided this query was not a good fit for grouping, as it is not as simple as just changing the order by for the lag(). Doing so results in ‘ORA-00979: not a GROUP BY expression’.
Instead I tried something completely different. (and I verified the employee_id values)
At first I tried using the dense_rank() function. That was OK, but didn’t deal with ties.
The following somewhat similar method deals with the tie. There are other ways to accomplish this, but this is enough fun for today.
Here is one that uses Pattern Matching…
select department_id,employee_id,salary
from hr.employees
match_recognize (
partition by department_id
order by salary
all rows per match
pattern (x $)
define x as 1=1
);
That is clever. I guess it is time to learn match_recognize.
Yes, Oren, that is clever.
I was waiting for you to post something like that.
Thanks
/* Formatted on 2018/9/8 8:22:10 (QP5 v5.227.12220.39754) */
SELECT a.*
FROM ( SELECT department_id, employee_id, salary
FROM hr.employees where department_id is not null
ORDER BY 1, 3 DESC) a
CONNECT BY PRIOR department_id < department_id
START WITH department_id = 10
–//I don't know how to stop …
WITH a
AS ( SELECT department_id, employee_id, salary
FROM hr.employees
WHERE department_id IS NOT NULL
ORDER BY 1, 3 DESC)
,b
AS ( SELECT a.*
FROM a
CONNECT BY PRIOR a.department_id < a.department_id
START WITH a.department_id = 10)
SELECT *
FROM b
WHERE ROWNUM <= (SELECT COUNT (DISTINCT department_id) FROM a)
Here is a solution that combines two features that were introduced in Oracle 12.1: Row Limiting and Lateral Inline Views:
Hello Liron,
Here are a few more solutions:
1. using FETCH FIRST
with departments as
( select distinct department_id
from hr.employees
where department_id is not null
)
select d.department_id, e.employee_id, e.salary
from departments d
cross join lateral (select e.employee_id, e.salary
from hr.employees e
where e.department_id = d.department_id
order by e.salary desc
fetch first 1 row only) e
order by d.department_id
/
DEPARTMENT_ID EMPLOYEE_ID SALARY
————————————–
10 200 4400
20 201 13000
30 114 11000
40 203 6500
50 121 8200
60 103 9000
70 204 10000
80 145 14000
90 100 24000
100 108 12008
110 205 12008
11 rows selected.
2. another solution using JSON
select t.department_id, j.employee_id, j.salary
from (
select department_id,
json_arrayagg(
json_object(key ’employee_id’ value employee_id,
key ‘salary’ value salary
) order by salary desc
) arr_emp
from hr.employees
where department_id is not null
group by department_id
) t,
json_table (t.arr_emp, ‘$[0]’
columns (
employee_id number path ‘$.employee_id’,
salary number path ‘$.salary’
)
) j
order by t.department_id
/
DEPARTMENT_ID EMPLOYEE_ID SALARY
————————————–
10 200 4400
20 201 13000
30 114 11000
40 203 6500
50 121 8200
60 103 9000
70 204 10000
80 145 14000
90 100 24000
100 108 12008
110 205 12008
11 rows selected.
These are both easily adaptable to the case when the requirement would be to return the “top n” paid employees for each department.
Cheers & Best Regards,
Iudith
–//use FIRST_VALUE analytic functions :
SELECT distinct
department_id
,FIRST_VALUE (employee_id)
OVER (PARTITION BY department_id ORDER BY salary DESC)
first_employee_id
,MAX (salary) OVER (PARTITION BY department_id) max_salary
FROM hr.employees emp
WHERE department_id IS NOT NULL
ORDER BY department_id
;