RLIKE function in Hive
RLIKE function is an advanced version of LIKE operator in Hive. It is used to search the advanced Regular expression pattern on the columns. If the given pattern matches with any substring of the column, the function returns TRUE. otherwise FALSE.
Syntax of RLIKE function
SELECT * FROM <Database_Name.Table_Name>
<column_name> RLIKE <pattern_to_match>;
The column_name is the name of column that we want to match with the regular expression pattern. Then we need to mention the keyword RLIKE followed by regular expression pattern.
Example : To fetch email address with specific pattern
Lets look at the RLIKE function with an example. We have a table called emp_info in Hive. It contains three columns such as emp_id,name and email_id.
Here the email_id column may contains the invalid email address. To fetch the valid email address, we need to write the pattern using the regular expression. To evaluate that pattern, we use RLIKE function in the hive query.
The emp_info table contains the below records. From this table, we are going to fetch the employee details who has the email address that starting with alphanumeric characters ( either number or alphabets).
Regular expression for alphanumeric characters
Lets understand few characters/symbols that used in the regular expression.
- ^ – The caret(^) symbol is used to match the certain pattern on the beginning of the string.
- [ ] – A string of characters enclosed in square brackets () matches any one character in that string.
- 0-9 – It is used to check any character that is a digit
- a-z – It is used to check any character that is a alphabet in smaller case.
- A-Z – To check any character that is a alphabet in upper case.
- | – The pipe symbol is used to check the alternate pattern in regex.
Lets write the pattern as below to check the starting character is a alphabet or number.
Hive query to fetch valid email address
Lets write the SELECT query to fetch the employee details who has the valid email address.
where email_id RLIKE '^([0-9]|[a-z]|[A-Z])';
where email_id RLIKE '^([0-9a-zA-Z])';
As we can see above, the query returned the employee details who has the email address that starting with alphanumeric characters.
The RLIKE function condition is FALSE for one employee (Kevin) since the email_id is starting with symbol ampersand(&). For the remaining employees, the condition is TRUE.