Basic NULL handling in SQL

null-handling-in-sql
null-handling-in-sql

One of my friends who is getting into data analytics, recently asked me what is null and why does it, need special treatment in any data projects? he also asked how do we work with NULL in SQL, this post is consolidation of answer given to him.

what-is-null-&&-null-handling-in-sql
what-is-null-&&-null-handling?

what is null in SQL?

Theoretically null is a value which is unknown, undefined and which is not equal to zero or empty character space.

When any field in the table is optional to fill the system, is going to fill it with null, which does not have any true value to it. Like I said earlier null is neither equal to zero nor it is like empty string (” “) so that implies normal comparison operators (=, <>, >, <, <=, >=) would not work on them. There are various functions and operators which we use to handle null.

Why do we need to handle null?

The normal aggregate functions, string functions and operators will not work normally unless we identify the null and replace them in the dataset before proceeding with the analysis. Below is one small example to see affect of null on functions such as COUNT( ) and AVG( )

test table has username and value against it, as you see user ‘k’ has null against it.

Below are the query results with and without null handling

In case-1 the average of values have been calculated without coalesce function AVG function ignores null and gives the wrong average similarly in case-2 null value will be ignored while counting if not handled.

This is one small example, In case of large datasets with significant number of nulls the analysis results can skew significantly if the data clean up and null handling is not performed.

that brings us to the topic

Methods of handling null in SQL:

COALESCE( ) is a widely available function that returns first non-null value from the list. It generally used when we have 2-3 columns which alternative to each other. For example, if we have ph_number1, ph_number2 and telephone columns which can be used to alternate to each other.

SELECT name, COALESCE(ph_number1,ph_number2,telephone) AS contact_num 
FROM personal_info

here in this example, if ph_number1 is NULL then it will loo for value is column ph_number2 in the same row similarly if ph_number2 is also NULL it will check the column telephone and produce the value.

Syntax:

SELECT  COALESCE(val1, alt_val1,alt_val2,.... alt_valn) FROM table_k;

As a demo, I have fired below SELECT statement over psql shell

COALESCE-example-over-psql-shell
COALESCE example over psql shell

In above example we have COALESCE function, passed with a list of values including nulls. The COALESCE function searches for non-null value and finds string ‘analytics’ first in the list and it returns the same.

NVL( ), IFNULL( ) and ISNULL( ) functions:

NVL is a substitutional function in oracle which takes only two arguments(unlike COALESCE) if first argument is null it substitutes with the second value.

SELECT user_name, NVL(value1 , 0) AS value1 FROM test;

IFNULL() and ISNULL() are equivalent functions in Mysql and Sqlserver respectively

/*Mysql*/

SELECT user_name, IFNULL(value1 , 0) AS value1 FROM test;

/*sql sever*/

SELECT user_name, ISNULL(value1 , 0) AS value1 FROM test;

Syntax:

NVL(expression, alternate) --Oracle

IFNULL(expression, alternate) --Mysql

ISNULL(expression, alternate) --Sqlserver

If the expression is NULL, these functions return the alternate value otherwise it returns output of the expression.

NULL checks in the WHERE clause conditions and CASE statements:

Now we know that, normal comparison operators will not work on NULL how do we identify the null in a condition?

for that we have IS NULL and IS NOT NULL

Example:

to pull IDs where last_name is null the query would look like below

SELECT  ID, first_name FROM employee WHERE last_name IS NULL;

Query to pull other subset where last name is having values look like,

SELECT  ID, first_name FROM employee WHERE last_name IS NOT NULL;

Update statement to replace nulls permanently in a table

UPDATE test SET value1=0 WHERE value1 IS NULL;

Handling NULL with CASE :

CASE statement is a popularly used statement to have defined output in the column for various conditions. This can also be used to handle null along with other conditions.

Below is an example of such, where, if status is ‘a’ the status_full will have ‘accepted’, when it is ‘d’ then ‘declined’ and when the column is null it returns ‘WIP’ indicating the application is still in ‘work in progress’ state.

SELECT application_id, customer_id, 
CASE 
WHEN status='a' THEN 'accepted'
WHEN status='d' THEN 'declined'
WHEN status IS NULL THEN 'WIP'
END AS status_full
FROM ods_application_status;

null handling in practical scenario:

Practically untidy data will have various other values in place of null; data originating from some systems/programs would automatically detect null and have them replaced with various other values; Some will have ‘NA’ some would have gotten replaced with spaces and some with ‘null’ written as text, so understand the data before handling those cases.

Bottomline:

To sum-up the topic, null is neither 0 nor space and it is not a defined value so it needs special handling. SQL has many functions and operators to handle null by using which we can eliminate skewness in the resulting insights.