Oracle Challenge #3 – SQL

September 5, 2018 | Liron Amitzi | 28 Comments | 10:27 am

Categories:

I debated quite a lot before writing this post. When I wrote the post about interviewing a DBA, in the “technical questions I do ask” part I just gave a general explanation of what I ask, but didn’t reveal the real questions. Now, more than 3 years later, I decided to give one of the questions as a challenge here.

This is not a difficult question or a complex scenario, but the challenge here is to provide a few ways to solve it using SQL only. So here it is:

You have an employee table with employee name, department, salary, etc. (you can use the hr.employees table). The result set I’m looking for is a list containing the top earning employee per department. For simplicity, let’s assume that for each department there is only one employee with the highest salary and also there are no employees without departments (while sh employee meets the first assumption, there are employees without departments, so you can ignore these rows). Also, don’t worry about the lookups, department_id is good enough as I’m looking for the logic and not the details.

If you use the hr.employee table, the expected result set should be something like this:

DEPARTMENT_ID EMPLOYEE_ID     SALARY
------------- ----------- ----------
           10         200       4400
           20         201      13000
           30         114      11000
           40         203       6500
           50         121       8200
           60         103       9000
           70         204      10000
           80         145      14000
           90         100      24000
          100         108      12008
          110         205      12008

So far I know of 5 ways to write this query. A different way is not similar SQL with changes in the SQL text, but SQL with a really different approach.
Good luck.

Update – the solution post is published, you can see it here.

Tags: challenge, interview, SQL

28 thoughts on “Oracle Challenge #3 – SQL”

jkstill says:

September 5, 2018 at 12:48 pm

Using ‘set tab off’ in sqlplus will help in lining up output in your blog.

Reply

Here’s one way to do it.
dunno if HTML tags will work…

JKSTILL@examples > l
  1  with salary_info as (
  2     select
  3     distinct
  4             e.department_id
  5             , e.employee_id
  6             , e.salary
  7             , row_number() over ( partition by e.department_id order by e.salary desc) dept_salary_order
  8     from hr.employees e
  9     where e.department_id is not null
 10  )
 11  select
 12     si.department_id
 13     , si.employee_id
 14     , si.salary
 15  from salary_info si
 16  where dept_salary_order = 1
 17* order by department_id
JKSTILL@examples > /
DEPARTMENT_ID EMPLOYEE_ID     SALARY
------------- ----------- ----------
           10         200       4400
           20         201      13000
           30         114      11000
           40         203       6500
           50         121       8200
           60         103       9000
           70         204      10000
           80         145      14000
           90         100      24000
          100         108      12008
          110         205      12008
11 rows selected.

amitzil says:

September 5, 2018 at 1:34 pm

Yes Jared, this is one option, you used an analytic function here.
There are (at least) 4 other options that don’t use analytic functions, but some other techniques.
Thanks for posting this

Reply

Hi Liron, this is my solution to the challenge:
select department_id, employee_id, salary
from hr.employees
where department_id is not null
and (department_id,salary) in (select department_id, max(salary) from hr.employees group by department_id)
order by department_id
DEPARTMENT_ID EMPLOYEE_ID SALARY
10 200 4400
20 201 13000
30 114 11000
40 203 6500
50 121 8200
60 103 9000
70 204 10000
80 145 14000
90 100 24000
100 108 12008
110 205 12008
–sorry for the output not being formatted properly.
Diana

amitzil says:

September 6, 2018 at 9:34 am

Great Diana, thanks.
That’s another option, using (what I call, I’m not sure this is the right term) a composite predicate.
3 to go!

Reply

There are some consequences to using that method. Note that dual full table scans:

JKSTILL@examples > l
  1  select department_id, employee_id, salary
  2  from hr.employees
  3  where department_id is not null
  4  and (department_id,salary) in (select department_id, max(salary) from hr.employees group by department_id)
  5* order by department_id
JKSTILL@examples > /
DEPARTMENT_ID EMPLOYEE_ID     SALARY
------------- ----------- ----------
           10         200       4400
           20         201      13000
           30         114      11000
           40         203       6500
           50         121       8200
           60         103       9000
           70         204      10000
           80         145      14000
           90         100      24000
          100         108      12008
          110         205      12008
11 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 802276651
----------------------------------------------------------------------------------
| Id  | Operation            | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |           |  3267 | 98010 |     5  (20)| 00:00:01 |
|*  1 |  FILTER              |           |       |       |            |          |
|   2 |   SORT GROUP BY      |           |  3267 | 98010 |     5  (20)| 00:00:01 |
|*  3 |    HASH JOIN         |           |  3267 | 98010 |     4   (0)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| EMPLOYEES |   106 |  2438 |     2   (0)| 00:00:01 |
|   5 |     TABLE ACCESS FULL| EMPLOYEES |   107 |   749 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------

amitzil says:

September 6, 2018 at 11:25 am

This is true, some of the solutions I’ve got perform better than others. I actually have one that performs even worse than this one, but it still works.

Reply
Niall Litchfield says:

September 7, 2018 at 8:04 am

I’d have thought the question would have performance aspects as a follow up challenge.

Reply

zhwsh says:

September 6, 2018 at 5:57 am

SELECT department_id, employee_id, salary
FROM ( SELECT department_id
,employee_id
,salary
,MAX (salary) OVER (PARTITION BY department_id) max_salary
FROM hr.employees
WHERE department_id IS NOT NULL
ORDER BY 1)
WHERE salary = max_salary;

Reply
1. amitzil says:
  
  September 6, 2018 at 9:35 am
  
  Thanks for the reply, it’s actually another way to use analytic functions, and Jared already did that.
  So still 3 to go!
  
  Reply
Maxim Demenko says:

September 7, 2018 at 4:27 am

The aggregate query doesn’t necessarily has to have a subquery:
select department_id,
max(employee_id) keep(dense_rank first ordre by salary desc) as employee_id,
max(salary) as salary
from hr.employees
where department_id is not null
group by department_id
order by department_id
And there are two other possibilities (which are similar to each other) with semijoin/antijoin
select department_id, employee_id, salary
from hr.employees e
where not exists (
select null
from hr.employees ee where e.department_id = ee.department_id and e.salary < ee.salary)
and department_id is not null
order by department_id
select left_e.department_id, left_e.employee_id, left_e.salary
from hr.employees left_e left outer join hr.employees right_e
on (left_e.department_id = right_e.department_id and left_e.salary < right_e.salary)
where right_e.salary is null
and left_e.department_id is not null
order by 1
Regards
Maxim

Reply
Oren Nakdimon (@DBoriented) says:

September 7, 2018 at 7:41 am

I have some solutions. I’ll contribute one (for now):
SELECT department_id,
MAX(employee_id) keep(dense_rank LAST ORDER BY salary NULLS FIRST) employee_id,
MAX(salary)
FROM hr.employees
WHERE department_id IS NOT NULL
GROUP BY department_id
ORDER BY department_id;

Reply
PRASHANT CHEGOOR says:

September 7, 2018 at 8:03 am

How about this ?
select department_id,employee_id,salary
from hr.employees emp
where department_id is not null
and not exists
( select 1
from hr.employees
where department_id = emp.department_id
and salary > emp.salary )
order by department_id;
DEPARTMENT_ID EMPLOYEE_ID SALARY
10 200 4400
20 201 13000
30 114 11000
40 203 6500
50 121 8200
60 103 9000
70 204 10000
80 145 14000
90 100 24000
100 108 12008
110 205 12008
11 rows selected.
Oracle Live SQL has used to execute above query.

Reply
zhwsh says:

September 7, 2018 at 8:27 am

using Cube, Grouping and Rollup?

Reply
1. jkstill says:
  
  September 7, 2018 at 9:53 am
  
  Been waiting to see if someone would try those. Grouping Sets anyone?
  
  Reply

OK, here’s one that uses something different. Far from perfect, but does use grouping and cube (both available since Oracle 8i I believe)

JKSTILL@examples > l
  1  with data as (
  2     select
  3     distinct
  4             e.department_id
  5             , case grouping(e.employee_id)
  6                     when 1 then lag(employee_id,1) over ( partition by e.department_id order by e.employee_id )
  7                     else null
  8             end employee_id
  9             , max(e.salary) salary
 10     from hr.employees e
 11     where e.department_id is not null
 12     group by cube(e.department_id, e.employee_id)
 13  )
 14  select *
 15  from data
 16  where employee_id is not null
 17     and department_id is not null
 18* order by 1
JKSTILL@examples > /
DEPARTMENT_ID EMPLOYEE_ID     SALARY
------------- ----------- ----------
           10         200       4400
           20         202      13000
           30         119      11000
           40         203       6500
           50         199       8200
           60         107       9000
           70         204      10000
           80         179      14000
           90         102      24000
          100         113      12008
          110         206      12008
11 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3993226991
---------------------------------------------------------------------------------------
| Id  | Operation                 | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |           |   106 |  4134 |     3  (34)| 00:00:01 |
|   1 |  SORT ORDER BY            |           |   106 |  4134 |     3  (34)| 00:00:01 |
|*  2 |   VIEW                    |           |   106 |  4134 |     3  (34)| 00:00:01 |
|   3 |    HASH UNIQUE            |           |   106 |  1166 |     3  (34)| 00:00:01 |
|   4 |     WINDOW BUFFER         |           |   106 |  1166 |     3  (34)| 00:00:01 |
|*  5 |      FILTER               |           |       |       |            |          |
|   6 |       SORT GROUP BY ROLLUP|           |   106 |  1166 |     3  (34)| 00:00:01 |
|*  7 |        TABLE ACCESS FULL  | EMPLOYEES |   106 |  1166 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("EMPLOYEE_ID" IS NOT NULL)
   5 - filter("E"."DEPARTMENT_ID" IS NOT NULL)
   7 - filter("E"."DEPARTMENT_ID" IS NOT NULL)

Hi Liron,
Here are 3 more solutions:
The first one just continues one of the ideas above, using GROUPING_ID:
with data as (
select department_id,
employee_id,
grouping(employee_id) g_empid,
max(salary) max_sal
from hr.employees
where department_id is not null
group by rollup(department_id, employee_id)
)
select department_id,
max(employee_id) employee_id,
max_sal salary
from data
group by department_id, max_sal
having count(distinct g_empid) = 2
order by department_id
/
DEPARTMENT_ID EMPLOYEE_ID SALARY
————————————–
10 200 4400
20 201 13000
30 114 11000
40 203 6500
50 121 8200
60 103 9000
70 204 10000
80 145 14000
90 100 24000
100 108 12008
110 205 12008
11 rows selected.
In the above solution posted by @JKSTILL the idea of using LAG is nice, but the result is not correct,
as the ordering inside the LAG function is by employee_id, not by salary.
Following are two more solutions essentially using aggregates, but in a different way than the above ones:
select department_id,
regexp_substr(
listagg(employee_id,’@’ on overflow truncate) within group(order by salary desc)
, ‘^[^@]+’) employee_id,
regexp_substr( listagg(salary,’@’ on overflow truncate) within group(order by salary desc)
, ‘^[^@]+’) salary
from hr.employees
where department_id is not null
group by department_id
order by department_id
/
DEPARTMENT_ID EMPLOYEE_ID SALARY
————————————–
10 200 4400
20 201 13000
30 114 11000
40 203 6500
50 121 8200
60 103 9000
70 204 10000
80 145 14000
90 100 24000
100 108 12008
110 205 12008
11 rows selected.
select department_id,
json_value(json_arrayagg(employee_id order by salary desc returning clob),’$[0]’) employee_id,
json_value(json_arrayagg(salary order by salary desc returning clob),’$[0]’) salary
from hr.employees
where department_id is not null
group by department_id
order by department_id
/
DEPARTMENT_ID EMPLOYEE_ID SALARY
————————————–
10 200 4400
20 201 13000
30 114 11000
40 203 6500
50 121 8200
60 103 9000
70 204 10000
80 145 14000
90 100 24000
100 108 12008
110 205 12008
11 rows selected.
Chapeau to Oren, for the nice MATCH_RECOGNIZE solution 🙂
Cheers & Best Regards,
Iudith

Nice Catch. In my haste to use something different I only checked the salaries and not the employee_id.
After playing around with it a bit I decided this query was not a good fit for grouping, as it is not as simple as just changing the order by for the lag(). Doing so results in ‘ORA-00979: not a GROUP BY expression’.
Instead I tried something completely different. (and I verified the employee_id values)
At first I tried using the dense_rank() function. That was OK, but didn’t deal with ties.

JKSTILL@examples > l
  1  with data as  (
  2     select
  3             e.department_id
  4             , e.employee_id
  5             , e.salary
  6             , dense_rank()  over (partition by e.department_id order by e.salary ) salrank
  7     from hr.employees e
  8     where e.department_id is not null
  9  )
 10  select *
 11  from data
 12  where salrank = 1
 13* order by department_id
JKSTILL@examples > /
DEPARTMENT_ID EMPLOYEE_ID     SALARY    SALRANK
------------- ----------- ---------- ----------
           10         200       4400          1
           20         202       6000          1
           30         119       2500          1
           40         203       6500          1
           50         132       2100          1
           60         107       4200          1
           70         204      10000          1
           80         173       6100          1
           90         101      17000          1
           90         102      17000          1
          100         113       6900          1
          110         206       8300          1
12 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2698234872
--------------------------------------------------------------------------------------
| Id  | Operation                | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |           |   106 |  5512 |     3  (34)| 00:00:01 |
|*  1 |  VIEW                    |           |   106 |  5512 |     3  (34)| 00:00:01 |
|*  2 |   WINDOW SORT PUSHED RANK|           |   106 |  1166 |     3  (34)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL     | EMPLOYEES |   106 |  1166 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("SALRANK"=1)
   2 - filter(DENSE_RANK() OVER ( PARTITION BY "E"."DEPARTMENT_ID" ORDER BY
              "E"."SALARY")<=1)
   3 - filter("E"."DEPARTMENT_ID" IS NOT NULL)

The following somewhat similar method deals with the tie. There are other ways to accomplish this, but this is enough fun for today.

JKSTILL@examples > l
  1  with data as  (
  2     select distinct
  3             e.department_id
  4             --, e.employee_id
  5             --, e.salary
  6             , max(employee_id) keep ( dense_rank first order by e.salary desc )  over (partition by e.department_id  ) empmax
  7             , max(salary) keep ( dense_rank first order by e.salary desc )  over (partition by e.department_id  ) salmax
  8     from hr.employees e
  9     where e.department_id is not null
 10  )
 11  select *
 12  from data
 13  --where salrank = 1
 14* order by department_id
JKSTILL@examples > /
DEPARTMENT_ID     EMPMAX     SALMAX
------------- ---------- ----------
           10        200       4400
           20        201      13000
           30        114      11000
           40        203       6500
           50        121       8200
           60        103       9000
           70        204      10000
           80        145      14000
           90        100      24000
          100        108      12008
          110        205      12008
11 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 961311344
-----------------------------------------------------------------------------------
| Id  | Operation             | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |           |   106 |  4134 |     4  (50)| 00:00:01 |
|   1 |  SORT ORDER BY        |           |   106 |  4134 |     4  (50)| 00:00:01 |
|   2 |   VIEW                |           |   106 |  4134 |     4  (50)| 00:00:01 |
|   3 |    HASH UNIQUE        |           |   106 |  1166 |     4  (50)| 00:00:01 |
|   4 |     WINDOW SORT       |           |   106 |  1166 |     4  (50)| 00:00:01 |
|*  5 |      TABLE ACCESS FULL| EMPLOYEES |   106 |  1166 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   5 - filter("E"."DEPARTMENT_ID" IS NOT NULL)

Oren Nakdimon (@DBoriented) says:

September 7, 2018 at 11:34 am

Here is one that uses Pattern Matching…
select department_id,employee_id,salary
from hr.employees
match_recognize (
partition by department_id
order by salary
all rows per match
pattern (x $)
define x as 1=1
);

Reply
1. jkstill says:
  
  September 7, 2018 at 1:17 pm
  
  That is clever. I guess it is time to learn match_recognize.
  
  Reply
  1. amitzil says:
    
    September 7, 2018 at 1:40 pm
    
    Yes, Oren, that is clever.
    I was waiting for you to post something like that.
    Thanks
    
    Reply
zhwsh says:

September 7, 2018 at 5:37 pm

/* Formatted on 2018/9/8 8:22:10 (QP5 v5.227.12220.39754) */
SELECT a.*
FROM ( SELECT department_id, employee_id, salary
FROM hr.employees where department_id is not null
ORDER BY 1, 3 DESC) a
CONNECT BY PRIOR department_id < department_id
START WITH department_id = 10
–//I don't know how to stop …

Reply
zhwsh says:

September 7, 2018 at 5:42 pm

WITH a
AS ( SELECT department_id, employee_id, salary
FROM hr.employees
WHERE department_id IS NOT NULL
ORDER BY 1, 3 DESC)
,b
AS ( SELECT a.*
FROM a
CONNECT BY PRIOR a.department_id < a.department_id
START WITH a.department_id = 10)
SELECT *
FROM b
WHERE ROWNUM <= (SELECT COUNT (DISTINCT department_id) FROM a)

Reply

Here is a solution that combines two features that were introduced in Oracle 12.1: Row Limiting and Lateral Inline Views:

select d.department_id, e.*
from hr.departments d,
     LATERAL (select employee_id,salary
              from hr.employees e
              where e.department_id = d.department_id
              order by salary desc
              FETCH FIRST 1 ROW ONLY) e;

Iudith Mentzel says:

September 8, 2018 at 5:16 pm

Hello Liron,
Here are a few more solutions:
1. using FETCH FIRST
with departments as
( select distinct department_id
from hr.employees
where department_id is not null
)
select d.department_id, e.employee_id, e.salary
from departments d
cross join lateral (select e.employee_id, e.salary
from hr.employees e
where e.department_id = d.department_id
order by e.salary desc
fetch first 1 row only) e
order by d.department_id
/
DEPARTMENT_ID EMPLOYEE_ID SALARY
————————————–
10 200 4400
20 201 13000
30 114 11000
40 203 6500
50 121 8200
60 103 9000
70 204 10000
80 145 14000
90 100 24000
100 108 12008
110 205 12008
11 rows selected.
2. another solution using JSON
select t.department_id, j.employee_id, j.salary
from (
select department_id,
json_arrayagg(
json_object(key ’employee_id’ value employee_id,
key ‘salary’ value salary
) order by salary desc
) arr_emp
from hr.employees
where department_id is not null
group by department_id
) t,
json_table (t.arr_emp, ‘$[0]’
columns (
employee_id number path ‘$.employee_id’,
salary number path ‘$.salary’
)
) j
order by t.department_id
/
DEPARTMENT_ID EMPLOYEE_ID SALARY
————————————–
10 200 4400
20 201 13000
30 114 11000
40 203 6500
50 121 8200
60 103 9000
70 204 10000
80 145 14000
90 100 24000
100 108 12008
110 205 12008
11 rows selected.
These are both easily adaptable to the case when the requirement would be to return the “top n” paid employees for each department.
Cheers & Best Regards,
Iudith

Reply
Pingback: Oracle Challenge #3 – Solution | GotoDBA
zhwsh says:

September 12, 2018 at 5:40 am

–//use FIRST_VALUE analytic functions :
SELECT distinct
department_id
,FIRST_VALUE (employee_id)
OVER (PARTITION BY department_id ORDER BY salary DESC)
first_employee_id
,MAX (salary) OVER (PARTITION BY department_id) max_salary
FROM hr.employees emp
WHERE department_id IS NOT NULL
ORDER BY department_id
;

Reply
Pingback: ODC Appreciation Day : Pattern Matching in SQL - @DBoriented

Oracle Challenge #3 – SQL

Like this:

28 thoughts on “Oracle Challenge #3 – SQL”

Leave a Reply Cancel reply

Oracle Challenge #3 – SQL

Share this:

Like this:

28 thoughts on “Oracle Challenge #3 – SQL”

Leave a Reply Cancel reply

Related Post

Using SPM to Control A Single SQL Execution PlanUsing SPM to Control A Single SQL Execution Plan

Did You Know #8 – IdentifiersDid You Know #8 – Identifiers

OGB Appreciation Day – Standby DatabaseOGB Appreciation Day – Standby Database