GotoDBA Database Development,Database Thoughts Oracle Challenge #3 – SQL

Oracle Challenge #3 – SQL

I debated quite a lot before writing this post. When I wrote the post about interviewing a DBA, in the “technical questions I do ask” part I just gave a general explanation of what I ask, but didn’t reveal the real questions. Now, more than 3 years later, I decided to give one of the questions as a challenge here.

This is not a difficult question or a complex scenario, but the challenge here is to provide a few ways to solve it using SQL only. So here it is:

You have an employee table with employee name, department, salary, etc. (you can use the hr.employees table). The result set I’m looking for is a list containing the top earning employee per department. For simplicity, let’s assume that for each department there is only one employee with the highest salary and also there are no employees without departments (while sh employee meets the first assumption, there are employees without departments, so you can ignore these rows). Also, don’t worry about the lookups, department_id is good enough as I’m looking for the logic and not the details.

If you use the hr.employee table, the expected result set should be something like this:

DEPARTMENT_ID EMPLOYEE_ID     SALARY
------------- ----------- ----------
           10         200       4400
           20         201      13000
           30         114      11000
           40         203       6500
           50         121       8200
           60         103       9000
           70         204      10000
           80         145      14000
           90         100      24000
          100         108      12008
          110         205      12008

So far I know of 5 ways to write this query. A different way is not similar SQL with changes in the SQL text, but SQL with a really different approach.
Good luck.

Update – the solution post is published, you can see it here.

28 thoughts on “Oracle Challenge #3 – SQL”

  1. Here’s one way to do it.
    dunno if HTML tags will work…

    JKSTILL@examples > l
      1  with salary_info as (
      2     select
      3     distinct
      4             e.department_id
      5             , e.employee_id
      6             , e.salary
      7             , row_number() over ( partition by e.department_id order by e.salary desc) dept_salary_order
      8     from hr.employees e
      9     where e.department_id is not null
     10  )
     11  select
     12     si.department_id
     13     , si.employee_id
     14     , si.salary
     15  from salary_info si
     16  where dept_salary_order = 1
     17* order by department_id
    JKSTILL@examples > /
    DEPARTMENT_ID EMPLOYEE_ID     SALARY
    ------------- ----------- ----------
               10         200       4400
               20         201      13000
               30         114      11000
               40         203       6500
               50         121       8200
               60         103       9000
               70         204      10000
               80         145      14000
               90         100      24000
              100         108      12008
              110         205      12008
    11 rows selected.
    
    1. Yes Jared, this is one option, you used an analytic function here.
      There are (at least) 4 other options that don’t use analytic functions, but some other techniques.
      Thanks for posting this

  2. Hi Liron, this is my solution to the challenge:
    select department_id, employee_id, salary
    from hr.employees
    where department_id is not null
    and (department_id,salary) in (select department_id, max(salary) from hr.employees group by department_id)
    order by department_id
    DEPARTMENT_ID EMPLOYEE_ID SALARY
    10 200 4400
    20 201 13000
    30 114 11000
    40 203 6500
    50 121 8200
    60 103 9000
    70 204 10000
    80 145 14000
    90 100 24000
    100 108 12008
    110 205 12008
    –sorry for the output not being formatted properly.
    Diana

    1. There are some consequences to using that method. Note that dual full table scans:

      JKSTILL@examples > l
        1  select department_id, employee_id, salary
        2  from hr.employees
        3  where department_id is not null
        4  and (department_id,salary) in (select department_id, max(salary) from hr.employees group by department_id)
        5* order by department_id
      JKSTILL@examples > /
      DEPARTMENT_ID EMPLOYEE_ID     SALARY
      ------------- ----------- ----------
                 10         200       4400
                 20         201      13000
                 30         114      11000
                 40         203       6500
                 50         121       8200
                 60         103       9000
                 70         204      10000
                 80         145      14000
                 90         100      24000
                100         108      12008
                110         205      12008
      11 rows selected.
      Execution Plan
      ----------------------------------------------------------
      Plan hash value: 802276651
      ----------------------------------------------------------------------------------
      | Id  | Operation            | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
      ----------------------------------------------------------------------------------
      |   0 | SELECT STATEMENT     |           |  3267 | 98010 |     5  (20)| 00:00:01 |
      |*  1 |  FILTER              |           |       |       |            |          |
      |   2 |   SORT GROUP BY      |           |  3267 | 98010 |     5  (20)| 00:00:01 |
      |*  3 |    HASH JOIN         |           |  3267 | 98010 |     4   (0)| 00:00:01 |
      |*  4 |     TABLE ACCESS FULL| EMPLOYEES |   106 |  2438 |     2   (0)| 00:00:01 |
      |   5 |     TABLE ACCESS FULL| EMPLOYEES |   107 |   749 |     2   (0)| 00:00:01 |
      ----------------------------------------------------------------------------------
      
      1. This is true, some of the solutions I’ve got perform better than others. I actually have one that performs even worse than this one, but it still works.

  3. SELECT department_id, employee_id, salary
    FROM ( SELECT department_id
    ,employee_id
    ,salary
    ,MAX (salary) OVER (PARTITION BY department_id) max_salary
    FROM hr.employees
    WHERE department_id IS NOT NULL
    ORDER BY 1)
    WHERE salary = max_salary;

  4. The aggregate query doesn’t necessarily has to have a subquery:
    select department_id,
    max(employee_id) keep(dense_rank first ordre by salary desc) as employee_id,
    max(salary) as salary
    from hr.employees
    where department_id is not null
    group by department_id
    order by department_id
    And there are two other possibilities (which are similar to each other) with semijoin/antijoin
    select department_id, employee_id, salary
    from hr.employees e
    where not exists (
    select null
    from hr.employees ee where e.department_id = ee.department_id and e.salary < ee.salary)
    and department_id is not null
    order by department_id
    select left_e.department_id, left_e.employee_id, left_e.salary
    from hr.employees left_e left outer join hr.employees right_e
    on (left_e.department_id = right_e.department_id and left_e.salary < right_e.salary)
    where right_e.salary is null
    and left_e.department_id is not null
    order by 1
    Regards
    Maxim

  5. I have some solutions. I’ll contribute one (for now):
    SELECT department_id,
    MAX(employee_id) keep(dense_rank LAST ORDER BY salary NULLS FIRST) employee_id,
    MAX(salary)
    FROM hr.employees
    WHERE department_id IS NOT NULL
    GROUP BY department_id
    ORDER BY department_id;

  6. How about this ?
    select department_id,employee_id,salary
    from hr.employees emp
    where department_id is not null
    and not exists
    ( select 1
    from hr.employees
    where department_id = emp.department_id
    and salary > emp.salary )
    order by department_id;
    DEPARTMENT_ID EMPLOYEE_ID SALARY
    10 200 4400
    20 201 13000
    30 114 11000
    40 203 6500
    50 121 8200
    60 103 9000
    70 204 10000
    80 145 14000
    90 100 24000
    100 108 12008
    110 205 12008
    11 rows selected.
    Oracle Live SQL has used to execute above query.

  7. OK, here’s one that uses something different. Far from perfect, but does use grouping and cube (both available since Oracle 8i I believe)

    JKSTILL@examples > l
      1  with data as (
      2     select
      3     distinct
      4             e.department_id
      5             , case grouping(e.employee_id)
      6                     when 1 then lag(employee_id,1) over ( partition by e.department_id order by e.employee_id )
      7                     else null
      8             end employee_id
      9             , max(e.salary) salary
     10     from hr.employees e
     11     where e.department_id is not null
     12     group by cube(e.department_id, e.employee_id)
     13  )
     14  select *
     15  from data
     16  where employee_id is not null
     17     and department_id is not null
     18* order by 1
    JKSTILL@examples > /
    DEPARTMENT_ID EMPLOYEE_ID     SALARY
    ------------- ----------- ----------
               10         200       4400
               20         202      13000
               30         119      11000
               40         203       6500
               50         199       8200
               60         107       9000
               70         204      10000
               80         179      14000
               90         102      24000
              100         113      12008
              110         206      12008
    11 rows selected.
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 3993226991
    ---------------------------------------------------------------------------------------
    | Id  | Operation                 | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT          |           |   106 |  4134 |     3  (34)| 00:00:01 |
    |   1 |  SORT ORDER BY            |           |   106 |  4134 |     3  (34)| 00:00:01 |
    |*  2 |   VIEW                    |           |   106 |  4134 |     3  (34)| 00:00:01 |
    |   3 |    HASH UNIQUE            |           |   106 |  1166 |     3  (34)| 00:00:01 |
    |   4 |     WINDOW BUFFER         |           |   106 |  1166 |     3  (34)| 00:00:01 |
    |*  5 |      FILTER               |           |       |       |            |          |
    |   6 |       SORT GROUP BY ROLLUP|           |   106 |  1166 |     3  (34)| 00:00:01 |
    |*  7 |        TABLE ACCESS FULL  | EMPLOYEES |   106 |  1166 |     2   (0)| 00:00:01 |
    ---------------------------------------------------------------------------------------
    Predicate Information (identified by operation id):
    ---------------------------------------------------
       2 - filter("EMPLOYEE_ID" IS NOT NULL)
       5 - filter("E"."DEPARTMENT_ID" IS NOT NULL)
       7 - filter("E"."DEPARTMENT_ID" IS NOT NULL)
    
    1. Hi Liron,
      Here are 3 more solutions:
      The first one just continues one of the ideas above, using GROUPING_ID:
      with data as (
      select department_id,
      employee_id,
      grouping(employee_id) g_empid,
      max(salary) max_sal
      from hr.employees
      where department_id is not null
      group by rollup(department_id, employee_id)
      )
      select department_id,
      max(employee_id) employee_id,
      max_sal salary
      from data
      group by department_id, max_sal
      having count(distinct g_empid) = 2
      order by department_id
      /
      DEPARTMENT_ID EMPLOYEE_ID SALARY
      ————————————–
      10 200 4400
      20 201 13000
      30 114 11000
      40 203 6500
      50 121 8200
      60 103 9000
      70 204 10000
      80 145 14000
      90 100 24000
      100 108 12008
      110 205 12008
      11 rows selected.
      In the above solution posted by @JKSTILL the idea of using LAG is nice, but the result is not correct,
      as the ordering inside the LAG function is by employee_id, not by salary.
      Following are two more solutions essentially using aggregates, but in a different way than the above ones:
      select department_id,
      regexp_substr(
      listagg(employee_id,’@’ on overflow truncate) within group(order by salary desc)
      , ‘^[^@]+’) employee_id,
      regexp_substr( listagg(salary,’@’ on overflow truncate) within group(order by salary desc)
      , ‘^[^@]+’) salary
      from hr.employees
      where department_id is not null
      group by department_id
      order by department_id
      /
      DEPARTMENT_ID EMPLOYEE_ID SALARY
      ————————————–
      10 200 4400
      20 201 13000
      30 114 11000
      40 203 6500
      50 121 8200
      60 103 9000
      70 204 10000
      80 145 14000
      90 100 24000
      100 108 12008
      110 205 12008
      11 rows selected.
      select department_id,
      json_value(json_arrayagg(employee_id order by salary desc returning clob),’$[0]’) employee_id,
      json_value(json_arrayagg(salary order by salary desc returning clob),’$[0]’) salary
      from hr.employees
      where department_id is not null
      group by department_id
      order by department_id
      /
      DEPARTMENT_ID EMPLOYEE_ID SALARY
      ————————————–
      10 200 4400
      20 201 13000
      30 114 11000
      40 203 6500
      50 121 8200
      60 103 9000
      70 204 10000
      80 145 14000
      90 100 24000
      100 108 12008
      110 205 12008
      11 rows selected.
      Chapeau to Oren, for the nice MATCH_RECOGNIZE solution 🙂
      Cheers & Best Regards,
      Iudith

      1. Nice Catch. In my haste to use something different I only checked the salaries and not the employee_id.
        After playing around with it a bit I decided this query was not a good fit for grouping, as it is not as simple as just changing the order by for the lag(). Doing so results in ‘ORA-00979: not a GROUP BY expression’.
        Instead I tried something completely different. (and I verified the employee_id values)
        At first I tried using the dense_rank() function. That was OK, but didn’t deal with ties.

        JKSTILL@examples > l
          1  with data as  (
          2     select
          3             e.department_id
          4             , e.employee_id
          5             , e.salary
          6             , dense_rank()  over (partition by e.department_id order by e.salary ) salrank
          7     from hr.employees e
          8     where e.department_id is not null
          9  )
         10  select *
         11  from data
         12  where salrank = 1
         13* order by department_id
        JKSTILL@examples > /
        DEPARTMENT_ID EMPLOYEE_ID     SALARY    SALRANK
        ------------- ----------- ---------- ----------
                   10         200       4400          1
                   20         202       6000          1
                   30         119       2500          1
                   40         203       6500          1
                   50         132       2100          1
                   60         107       4200          1
                   70         204      10000          1
                   80         173       6100          1
                   90         101      17000          1
                   90         102      17000          1
                  100         113       6900          1
                  110         206       8300          1
        12 rows selected.
        Execution Plan
        ----------------------------------------------------------
        Plan hash value: 2698234872
        --------------------------------------------------------------------------------------
        | Id  | Operation                | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
        --------------------------------------------------------------------------------------
        |   0 | SELECT STATEMENT         |           |   106 |  5512 |     3  (34)| 00:00:01 |
        |*  1 |  VIEW                    |           |   106 |  5512 |     3  (34)| 00:00:01 |
        |*  2 |   WINDOW SORT PUSHED RANK|           |   106 |  1166 |     3  (34)| 00:00:01 |
        |*  3 |    TABLE ACCESS FULL     | EMPLOYEES |   106 |  1166 |     2   (0)| 00:00:01 |
        --------------------------------------------------------------------------------------
        Predicate Information (identified by operation id):
        ---------------------------------------------------
           1 - filter("SALRANK"=1)
           2 - filter(DENSE_RANK() OVER ( PARTITION BY "E"."DEPARTMENT_ID" ORDER BY
                      "E"."SALARY")<=1)
           3 - filter("E"."DEPARTMENT_ID" IS NOT NULL)
        

        The following somewhat similar method deals with the tie. There are other ways to accomplish this, but this is enough fun for today.

        JKSTILL@examples > l
          1  with data as  (
          2     select distinct
          3             e.department_id
          4             --, e.employee_id
          5             --, e.salary
          6             , max(employee_id) keep ( dense_rank first order by e.salary desc )  over (partition by e.department_id  ) empmax
          7             , max(salary) keep ( dense_rank first order by e.salary desc )  over (partition by e.department_id  ) salmax
          8     from hr.employees e
          9     where e.department_id is not null
         10  )
         11  select *
         12  from data
         13  --where salrank = 1
         14* order by department_id
        JKSTILL@examples > /
        DEPARTMENT_ID     EMPMAX     SALMAX
        ------------- ---------- ----------
                   10        200       4400
                   20        201      13000
                   30        114      11000
                   40        203       6500
                   50        121       8200
                   60        103       9000
                   70        204      10000
                   80        145      14000
                   90        100      24000
                  100        108      12008
                  110        205      12008
        11 rows selected.
        Execution Plan
        ----------------------------------------------------------
        Plan hash value: 961311344
        -----------------------------------------------------------------------------------
        | Id  | Operation             | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
        -----------------------------------------------------------------------------------
        |   0 | SELECT STATEMENT      |           |   106 |  4134 |     4  (50)| 00:00:01 |
        |   1 |  SORT ORDER BY        |           |   106 |  4134 |     4  (50)| 00:00:01 |
        |   2 |   VIEW                |           |   106 |  4134 |     4  (50)| 00:00:01 |
        |   3 |    HASH UNIQUE        |           |   106 |  1166 |     4  (50)| 00:00:01 |
        |   4 |     WINDOW SORT       |           |   106 |  1166 |     4  (50)| 00:00:01 |
        |*  5 |      TABLE ACCESS FULL| EMPLOYEES |   106 |  1166 |     2   (0)| 00:00:01 |
        -----------------------------------------------------------------------------------
        Predicate Information (identified by operation id):
        ---------------------------------------------------
           5 - filter("E"."DEPARTMENT_ID" IS NOT NULL)
        
  8. /* Formatted on 2018/9/8 8:22:10 (QP5 v5.227.12220.39754) */
    SELECT a.*
    FROM ( SELECT department_id, employee_id, salary
    FROM hr.employees where department_id is not null
    ORDER BY 1, 3 DESC) a
    CONNECT BY PRIOR department_id < department_id
    START WITH department_id = 10
    –//I don't know how to stop …

  9. WITH a
    AS ( SELECT department_id, employee_id, salary
    FROM hr.employees
    WHERE department_id IS NOT NULL
    ORDER BY 1, 3 DESC)
    ,b
    AS ( SELECT a.*
    FROM a
    CONNECT BY PRIOR a.department_id < a.department_id
    START WITH a.department_id = 10)
    SELECT *
    FROM b
    WHERE ROWNUM <= (SELECT COUNT (DISTINCT department_id) FROM a)

  10. Hello Liron,
    Here are a few more solutions:
    1. using FETCH FIRST
    with departments as
    ( select distinct department_id
    from hr.employees
    where department_id is not null
    )
    select d.department_id, e.employee_id, e.salary
    from departments d
    cross join lateral (select e.employee_id, e.salary
    from hr.employees e
    where e.department_id = d.department_id
    order by e.salary desc
    fetch first 1 row only) e
    order by d.department_id
    /
    DEPARTMENT_ID EMPLOYEE_ID SALARY
    ————————————–
    10 200 4400
    20 201 13000
    30 114 11000
    40 203 6500
    50 121 8200
    60 103 9000
    70 204 10000
    80 145 14000
    90 100 24000
    100 108 12008
    110 205 12008
    11 rows selected.
    2. another solution using JSON
    select t.department_id, j.employee_id, j.salary
    from (
    select department_id,
    json_arrayagg(
    json_object(key ’employee_id’ value employee_id,
    key ‘salary’ value salary
    ) order by salary desc
    ) arr_emp
    from hr.employees
    where department_id is not null
    group by department_id
    ) t,
    json_table (t.arr_emp, ‘$[0]’
    columns (
    employee_id number path ‘$.employee_id’,
    salary number path ‘$.salary’
    )
    ) j
    order by t.department_id
    /
    DEPARTMENT_ID EMPLOYEE_ID SALARY
    ————————————–
    10 200 4400
    20 201 13000
    30 114 11000
    40 203 6500
    50 121 8200
    60 103 9000
    70 204 10000
    80 145 14000
    90 100 24000
    100 108 12008
    110 205 12008
    11 rows selected.
    These are both easily adaptable to the case when the requirement would be to return the “top n” paid employees for each department.
    Cheers & Best Regards,
    Iudith

  11. –//use FIRST_VALUE analytic functions :
    SELECT distinct
    department_id
    ,FIRST_VALUE (employee_id)
    OVER (PARTITION BY department_id ORDER BY salary DESC)
    first_employee_id
    ,MAX (salary) OVER (PARTITION BY department_id) max_salary
    FROM hr.employees emp
    WHERE department_id IS NOT NULL
    ORDER BY department_id
    ;

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Post