GotoDBA Database Development,DB Camparison DB Comparison Project: Using NULLs

DB Comparison Project: Using NULLs

I decided to start a comparison between four main databases from a development perspective. If you want to read more about it, I introduced the project here.

And for the topic of this post: NULL value is a unique thing in the database. It means “no value” and it is handled a bit differently in every database, as I have found. In this post I’ll take all four databases I’m testing: SQL Server, PostgreSQL, MariaDB, and Oracle, and will compare how they all handle NULL values in table columns.

In the Standard SQL 1992, NULL is defined like this:

null value (null): A special value, or mark, that is used to
            indicate the absence of any data value.

In order to check if a column contains NULL, we should use the “IS NULL” or “IS NOT NULL” predicate, and not any other predicate type. So let’s start with checking that.

Preparing the table

I’ll start with providing the commands to create the table I’ll use. Note that for simplicity I’m going to use SQL Developer to connect to all four databases, so I’m not going to copy the result from each database native command line utility, but from the output part of SQL Developer. This means that the output might look different if you try it with the database native tool (or a different 3rd party one), but I can assure you that the data I’m posting is for the database I’m mentioning.

Apparently, creating a simple table works the same on all databases, so here is the table I created and will use in this post:

CREATE TABLE test_tab
(id INTEGER,
 f1 INTEGER UNIQUE);

INSERT INTO test_tab VALUES(1,1);
INSERT INTO test_tab VALUES(2,NULL);

IS [NOT] NULL

We’ll start with the very basic test of “IS NULL” and “IS NOT NULL”. I expect all database to behave the same here as this is a very basic SQL usage and I would guess most databases if not all will comply.

I executed two queries:

SELECT * FROM test_tab WHERE f1 IS NULL;

SELECT * FROM test_tab WHERE f1 IS NOT NULL;

The results, as expected, were the same on all databases:

-- SELECT * FROM test_tab WHERE f1 IS NULL;
id         f1         
---------- ---------- 
2                     

-- SELECT * FROM test_tab WHERE f1 IS NOT NULL;
id         f1         
---------- ---------- 
1          1          

Other conditions

Before I start other tests, I wanted to make sure they behave the same with different operators like “>”, “<” and “=”, so I executed these queries:

SELECT * FROM test_tab WHERE f1=NULL;

SELECT * FROM test_tab WHERE f1<NULL;

SELECT * FROM test_tab WHERE NULL=NULL;

I even tried NULL=NULL to make sure that it’s not evaluated to TRUE (it shouldn’t).

All databases returned no rows for all queries. Now we have a baseline that the basic NULL comparison is the same with all databases. Let’s move on.

Unique

Here it’s starting to get interesting. I knew that there is a difference between Oracle and SQL Server, but didn’t know about the other two, so wanted to check. The question: how does the database handle UNIQUE columns that have NULL values.

As you might have noticed, when I created the table above, I defined the “f1” column as UNIQUE. This creates a unique index on this column and prevents duplicate values in it. I well aware that there are deferred constraints which have some special qualities, but I don’t want to get into this here, so let’s stay with the default constraint I specified when I created the table.

I’m sure we all understand that if “f1” is UNIQUE, I won’t be able to insert another row where “f1” is “1” (I already have this value from the first insert command after I created the table). But what happens if I try to insert another NULL (I have one already).

The SQL:

INSERT INTO test_tab VALUES(3,NULL);

And the result? MariaDB, PostgreSQL, and Oracle behaved the same and I got:

1 row inserted.

However, SQL Server returned something completely different:

Error starting at line : 15 in command -
INSERT INTO test_tab VALUES(3,NULL)
Error at Command Line : 15 Column : 1
Error report -
SQL Error: Violation of UNIQUE KEY constraint 'UQ__test_tab__32139E59C76E5D9B'. Cannot insert duplicate key in object 'dbo.test_tab'. The duplicate key value is (<NULL>).

For SQL Server, 2 NULLs are the same and violate the UNIQUE constraint on the column. However, in the other databases, 2 NULLs are NOT the same and the database allows more than a single NULL value for a unique column.

Indexes

The last thing I wanted to check is index usage with NULLs. Again, I’m aware of some difference between Oracle and SQL Server, so I wanted to check the others as well.

This is a little bit more complex because I want to check index usage, so I’ll have to check the execution plan of each database. But first, I’ll prepare the data in the table:

TRUNCATE TABLE test_tab;

INSERT INTO test_tab VALUES(1,1);
INSERT INTO test_tab VALUES(2,2);
INSERT INTO test_tab VALUES(3,3);
INSERT INTO test_tab VALUES(null,4);

CREATE INDEX tst_null_idx ON test_tab(id);

Now let’s check the SQL execution plan of a query that returns the NULL:

MariaDB

-- The Query
EXPLAIN SELECT * FROM test_tab Where id IS NULL;

-- The plan
id                   select_type         table             type       possible_keys    key                      key_len     ref        rows       Extra                       
-------------------- ------------------- ----------------- ---------- ---------------- ------------------------ ----------- ---------- ---------- ----------------------------
1                    SIMPLE              test_tab          ref        tst_null_idx     tst_null_idx             5           const      1          Using index condition       

In the output above you see that the “type” column contains the value “ref”. This means an index traversal in MariaDB, so MariaDB scans the index in order to find the rows where the “id” column is NULL.

PostgreSQL

-- The query
EXPLAIN SELECT * FROM test_tab Where id IS NULL;

-- The plan
QUERY PLAN
-----------------------------------------------------------------------------
Bitmap Heap Scan on test_tab  (cost=4.24..14.88 rows=11 width=8)
  Recheck Cond: (id IS NULL)
  ->  Bitmap Index Scan on tst_null_idx  (cost=0.00..4.24 rows=11 width=0)
        Index Cond: (id IS NULL)

What we see in the plan above is that PostgreSQL is using the index (last line) based on the condition that “id IS NULL” to find the relevant rows.

SQL Server

-- The query
SET showplan_all on;
SELECT * FROM test_tab with (index(tst_null_idx)) Where id IS NULL;

-- The plan
StmtText                                                                                                                         StmtId      NodeId      Parent      PhysicalOp                     LogicalOp                      Argument                                                                                                                DefinedValues                            EstimateRows   EstimateIO     EstimateCPU    AvgRowSize  TotalSubtreeCost OutputList                                                  Warnings Type                                                             Parallel EstimateExecutions 
-------------------------------------------------------------------------------------------------------------------------------- ----------- ----------- ----------- ------------------------------ ------------------------------ ----------------------------------------------------------------------------------------------------------------------- ---------------------------------------- -------------- -------------- -------------- ----------- ---------------- ----------------------------------------------------------- -------- ---------------------------------------------------------------- -------- ------------------ 
SELECT * FROM test_tab with (index(tst_null_idx)) Where id IS NULL                                                               1           1           0                                                                         1                                                                                                                                                                1.0                                                      0.00657038                                                                            SELECT                                                           false                       
  |--Nested Loops(Inner Join, OUTER REFERENCES:([Bmk1000]))                                                                      1           2           1           Nested Loops                   Inner Join                     OUTER REFERENCES:([Bmk1000])                                                                                                                                     1.0            0.0            4.18E-6        15          0.00657038       [test].[dbo].[test_tab].[id], [test].[dbo].[test_tab].[f1]           PLAN_ROW                                                         false    1.0                
       |--Index Seek(OBJECT:([test].[dbo].[test_tab].[tst_null_idx]), SEEK:([test].[dbo].[test_tab].[id]=NULL) ORDERED FORWARD)  1           3           2           Index Seek                     Index Seek                     OBJECT:([test].[dbo].[test_tab].[tst_null_idx]), SEEK:([test].[dbo].[test_tab].[id]=NULL) ORDERED FORWARD, FORCEDINDEX  [Bmk1000], [test].[dbo].[test_tab].[id]  1.0            0.003125       1.581E-4       19          0.0032831        [Bmk1000], [test].[dbo].[test_tab].[id]                              PLAN_ROW                                                         false    1.0                
       |--RID Lookup(OBJECT:([test].[dbo].[test_tab]), SEEK:([Bmk1000]=[Bmk1000]) LOOKUP ORDERED FORWARD)                        1           5           2           RID Lookup                     RID Lookup                     OBJECT:([test].[dbo].[test_tab]), SEEK:([Bmk1000]=[Bmk1000]) LOOKUP ORDERED FORWARD                                     [test].[dbo].[test_tab].[f1]             1.0            0.003125       1.581E-4       11          0.0032831        [test].[dbo].[test_tab].[f1]                                         PLAN_ROW                                                         false    1.0                

This plan is a bit long, but if you look at the 2nd to last line you’ll see “Index Seek” which indicates that SQL Server used the index to find the relevant rows, just like PostgreSQL and MariaDB (there were some differences, but I don’t want to dive into this, I’m only interested in the fact that they used the index).

If you read the query, you might also notice that I used a hint (the “with (index(tst_null_idx))”) to ask SQL Server to use the index. Without it, it didn’t use the index and simply read the entire table. My guess is that this is because of statistics and efficiency, and it makes sense with small tables (Oracle behaves the same). Because I wanted to force it to use the index in this case, I used the hint.

Oracle

-- The queries
SELECT /*+ index(test_tab) */ * FROM test_tab WHERE id IS NULL;
SELECT * FROM TABLE(dbms_xplan.display_cursor());

-- The plan
------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          |       |       |     3 (100)|          |
|*  1 |  TABLE ACCESS FULL| TEST_TAB |     1 |    26 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------

And here Oracle is the exception. While the other databases used the index to get the rows where “id” is NULL, Oracle scanned the entire table (the last line in the plan). If you look at the query, I tried to use a hint as well (the /*+ index(test_tab) */ part), but Oracle simply ignored it.

This happens because Oracle does not keep NULLs in indexes (apparently it’s the only one out of the four that doesn’t). So when I have an IS NULL predicate, Oracle cannot use the index to find the rows because the information is simply not there. In these cases Oracle will always perform full table scan. Note that “IS NOT NULL” can use the index as all other values are in the index, only NULLs are not.

If you want to read more about this behavior and a hack to workaround it, read this post by Richard Foote.

Conclusions

So to sum things up:

  • The basic predicate of using NULLs (IS NULL, IS NOT NULL, and other operators with NULL like <, >, =, etc.) behave the same way exactly on all four databases.
  • When using a column with UNIQUE constraint, Oracle, MariaDB, and PostgreSQL allow multiple rows with NULL value in the column, while SQL Server only allows a single row with NULL value.
  • When querying with “IS NULL” predicate on an indexed column, SQL Server, PostgreSQL, and MariaDB can use the index to satisfy the predicate, while Oracle can’t and will perform table scan.

Final Notes

I started this post series because I started getting interested in basic differences that developers might notice when working with different databases. NULL is only one of these things. I have more in mind which I’ll write about soon.

I hope you enjoyed this post. Please let me know if you have any other interesting differences between different databases, or leave a comment below if you have anything to say about this post.

If you’d like to get an email when I publish a new post, fill in your email and click the “subscribe” button below:

3 thoughts on “DB Comparison Project: Using NULLs”

  1. Hi Liron,

    great post – thanks for sharing! I noticed an important difference between Oracle and PostgreSQL when dealing with NULLs: An empty string is not considered as NULL (if i googled correctly Oracle is the exception here…all other should at some point return rows when adding a predicate like WHERE row=”).

    Another funny thing: Try to concat NULLs with other strings 🤔.

    Never thought that small things would surprise me that much when switching from one DB to another that support SQL *standard* :).

    Cheers,
    Jonas

    1. Hi Jonas,
      Thanks for the feedback.
      These are excellent additions! I might add another post based on this and you are right, it’s important and sometimes completely unexpected 🙂
      Liron

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Post