Sum function with Case expression in Hive

Sum() function in Hive

Sum is one of the Aggregate function that returns the sum of the values of the column in a group. The CASE expression check the conditions against column values and returns a values from the THEN clause. Hive supports aggregate functions such as count,min,max,avg,sum,..etc with the Case expression. So the aggregate functions can perform the calculations on the values which are returned from the CASE expression.

Syntax for Sum function with Case expression

As mentioned above, the case expression should be mentioned inside the sum function. The Case expression contains the WHEN condition which is validated against the value of the column.

  • If the condition is TRUE, it returns the value from the THEN clause.
  • Otherwise it returns the value from the ELSE clause.

Example : SUM function in Hive

Lets understand the sum function with examples. Consider that we have a table called item_purchase in Hive. It contains the sales details of products. From that table, we need to find the total amount of sales for Electronics Category.

sum function example in Hive
sum function example in Hive

Lets write the Hive query using the SUM function. First we need to filter the product category of Electronics using the WHERE condition. Then we need to sum the values of amount column.

Where condition to filter the values
Where condition to filter the values

As we can see below, the sum function returns the total_amount as 2120 for the product category of Electronics.

Aggregate function without group by in Hive
Aggregate function without group by in Hive

Example : SUM function with CASE expression

For this example, we will take the student marks list table as below in Hive. This table contains the marks for four subjects such as Maths, Physics, Chemistry and Biology.

Sum function with case expression example in Hive
Sum function with case expression example in Hive

From this table, we want to find below details for each subjects.

  • Number of Students who got equal or greater than 90 marks
  • Number of Students who got equal or lesser than 65 marks

Using count() function with group by statement

If we don’t use sum with case expression, we need to write two queries as below

Output

In this query, we are using count() function with group by statement for subject.

Count function with Group by statement in Hive
Count function with Group by statement in Hive

If we see the student_results table, none of the students scored greater than 90 in Chemistry. So it is not included in the output.

Count function example in Hive
Count function example in Hive

Similarly none of the students scored below 65 in the subject of Biology. So we can’t see the count value for the Biology subject. But if we want the count value for all the subjects, we need to use sum function with case expression in Hive.

Using sum() function with case expression

If we use sum function with case expression, we can get the required result in single query as below.

As mentioned in the query, we have used the case expression as below to validate the marks. If the condition is TRUE, it will return 1. Otherwise it will return 0.

In addition, we used the sum function with the group by of subject column. So it will add the values which are returned from the case expression and give the final output as below.

Output

Case when expression with Sum function in Hive
Case when expression with Sum function in Hive

Finally we got the number of students who scored greater than 90 and lesser than 65 in Hive.

Recommended Articles