Substring function in Hive with examples

Hive substring :

Substring is a built-in string function in Hive which is used to extract a part of a string. In the hive sql, we can either specify substring or substr to get the required string from the column/value.

Substring syntax in Hive

  • <input string/column> – Here we need to mention the input string value or the column name in which we want to perform the substring operation.
  • start – This is the starting position of the substring value.
  • length – This is the optional argument which denotes the length/end position of the substring. If we didn’t mention the length, the function takes the remaining characters from the starting position.

Example 1 : Substring with input string in Hive

Lets look at the substring function with simple example.

  • Input string -> Revisit Class
  • Required substring -> Revisit

Here we are interested to take the substring as “Revisit” from the input string. When we compared the substring with the input string, we know that the starting position of the substring is 1 and length/end position of the substring is 7. Lets pass these values to substring function to get the required string in Hive.

hive substr example
hive substr example

Example 2 : Substring with column in Hive

There is a customer transaction table “cust_transaction” in hive as below. Lets write a hive sql to extract the substring from the account_type column.

Hive Substring example
Hive Substring example

As we can see that the account_type column has currency code with the account type values.Here the currency codes are USD,CAD,EUR and AUD. The account types are Saving,Checking and Term.

We want to extract the currency code from the account_type column. The starting position of the currency code is 1 and the length/ending position of the currency code is 3.

Hive substring function
Hive substring function

As we can see the function returned the currency codes successfully. If we didn’t specify the length as 3 in substring, it will return the entire the string which is starting from position 1.

Substring without length in Hive
Substring without length in Hive

Example 3 : Get last few characters using substring

Hive allows to give the negative values as starting position in substring. It will return the characters from right to left. The length argument is optional here.

substring from right in hive
substring from right in hive

Example 4 : Instr function with substring in hive

In some cases, we don’t know the starting position of the substring. Also the starting position will change for each value in hive table. But we might have some common string/character present in that column. In that case, we can use instr function to get the starting position of the string which is used to substring function in hive.

Lets extract the account type alone without currency code from the customer transaction table.

Instr function in substring hive
Instr function in substring