Coalesce function in Hive with examples

COALESCE function in Hive

Coalesce is a conditional function in Hive which is used to handle the NULL values. It accepts a number of arguments and returns the first non-NULL argument. If all the arguments are NULL, it will return the NULL value.

COALESCE function syntax

  • The arguments can be value, column name or expression which will be evaluated by COALESCE function.

Example

Let’s understand the COALESCE function with examples in Hive. We are giving two fruit names as arguments in the coalesce function. Since there is no null value, it returned the first non-null argument as favourite fruit.

Coalesce function example in Hive
Coalesce function example in Hive

Now the first argument is NULL and the second argument is orange. As mentioned earlier, Coalesce will return the first non-null argument. So the favourite fruit is orange.

Coalesce function with null values in Hive
Coalesce function with null values in Hive

If we give the NULL values in all the arguments, Coalesce function will return the NULL as output.

Coalesce function return null in Hive
Coalesce function return null in Hive

Similarly we can pass n number of arguments in Coalesce function to handle the null values.

N number of argument in Coalesce function
N number of argument in Coalesce function

Replace null value with default value using Coalesce

In this example, we will use the Coalesce function on the table. We have created a table called as user_subscriptions in Hive. This table contains the billing information of customers. It contains the below columns.

Describe table in Hive
Describe table in Hive

The table user_subscriptions has 3 records. As mentioned below, few row has NULL values in the columns card_number, home_address and office_address.

NULL values in Hive table
NULL values in Hive table

Query without Coalesce function

Let’s write a query to extract the first two digits from the card_number. This helps us to find the credit card issuer of each customer. We used substring function to get the card_issuer_id.

Since the user_id #44821 doesn’t have the card number, the query returned the NULL value. For other users, we got the first two digits from card_number.

Get card issuer id from credit card number in Hive
Get card issuer id from credit card number in Hive

Query with Coalesce function

We want to avoid the missing data in card_number field. Because it can affect the downstream system/product if they consumed this table. Example : Bias in Machine learning models

One solution would be replacing the NULL value with some default value. In this example, we are going to replace the NULL value with 0. Let’s add the Coalesce function on top of the substring function in the query.

Coalesce function to replace null values in Hive
Coalesce function to replace null values in Hive

As we shown in the screenshot, the Coalesce function is replaced the NULL value with default value 0 for user_id #44821.

Replace missing data with another column

Consider that we want to get the user’s address from the same table user_subscriptions. There are two address columns in the table such as home_address and office_address.

Our goal is to get the address from the column office_address. If the office address is not present in the table, we need to get the address from the column home_address.

user_subscriptions table in Hive
user_subscriptions table in Hive

If we look at the table user_subscriptions, the user_id #38392 has NULL value in the column office_address. To handle the NULL value and get the result, we can use the COALESCE function with the arguments of office_address & home_address

Coalesce function example with multiple columns in Hive
Coalesce function example with multiple columns in Hive

As we shown above, Coalesce function has returned the address after evaluating the NULL values from the columns. For the user_id #38392, it returned the home_address as mailing address because the office_address value is NULL.

Recommended Articles