Trim function in Hive
Trim function removes the extra space from both ends of the string value. In other words, it removes the leading and trailing space from the string. Along with this, Hive providing two more trim function as below.
- Ltrim – It return a string after trimming the space from the beginning (left hand side) of a string.
- Rtrim – It return a string after trimming the space from the end (right hand side) of a string.
Trim function syntax
Trim function accepts only the String,Char and Varchar data types. If we pass the Numeric values in this function, it will throw an error in Hive.
Trim function example
Let’s consider that we have a table package_tracking in Hive. It contains the following columns package_id, tracking_id, current_city and expected_delivery_date as below.
The current_city column has name of the city where the package is located. But there is some extra space in those city names. Example: Seattle has leading space & Miami has both leading and trailing space.
If we query this table based on city name, it will not provide the accurate result. Lets try it without trim function
Since the city name Miami has the space in both beginning and end of the value, the query is not returning the result set. To fetch the correct results, we need to add trim function for the current_city column.
where trim(current_city) = 'Miami';
Now the trim function removed the extra space from the current_city column and it returned the required results.
In order to track the number of packages in each city, Let’s write the group by query with trim function as below.
select trim(current_city) as current_city,
count(*) as package_count
group by trim(current_city);
Ltrim function example
As we mentioned earlier, Ltrim function removes the space from the beginning of the string. To demonstrate this function, we will show some queries with and without ltrim function.
The string Seattle has the leading space in the column current_city. Let’s write a query to fetch the package count of Seattle city.
Query without Ltrim function
As we shown below, we tried to fetch the city Seattle without ltrim function. Since it has the beginning space in the column, it is not matched with the given condition. So zero records returned as output.
Query with Ltrim function
Now we tried the same query with ltrim function for the column current_city. Since it removed the leading spaces from current_city column, the given condition is matched and returned the correct output as below.
Rtrim function example
Rtrim function removes the space from end of the string. In the table package_tracking, Boston string has the trailing space for one of the value in the column current_city. Let’s write the query with and without rtrim function to see the difference in output.
Query without Rtrim function
As we shown below, we are trying to fetch the package count of Boston city. The table package_tracking has two entries for Boston city. But the query returned the count as 1. This is because the other string value has the trailing space in the current_city column.
Query with Rtrim function
Now we tried the same query with rtrim function like rtrim(current_city) = ‘Boston. It removed the trailing space from current_city column and returned the correct result as below.
- Hive String functions with examples
- If condition in Hive with examples
- Case statement in Hive with examples