Regexp_substr function in Teradata with examples

Posted on 14th August 20215th February 2022 by RevisitClass

Contents

Regexp_substr function in Teradata

Teradata supports substring function to extract a portion of the string. But It doesn’t allow regular expression to get the string with specific pattern. On the other hand, Regexp_substr function extracts a substring based on regular expression.

In this tutorial, we will explain about the Regexp_substr function with examples.

Syntax of Regexp_substr function

regexp_substr(source_string, regexp_string, position_arg, count_arg, match_arg)

source_string – It is a input string, If source_string is NULL, NULL is returned.
regexp_string – It is a regular expression string that we want to match with input string. If regexp_string is NULL, NULL is returned.
position_arg -> Optional. It is a numeric argument. It specifies the position in the source string from which to start searching. (default value is 1).
count_arg -> Optional. It is a numeric argument. It specifies the number of occurrence to be returned. (default value is 1).
match_arg -> Optional. It is a character argument. The valid values are listed below

‘i’ = case-insensitive matching.
‘c’ = case sensitive matching.
‘n’ = To treat the period character (match any character) as new line character
‘m’ = It is used to treat the source_string as multiple lines instead of as a single line.
‘l’ = if source_string exceeds the current maximum allowed source_string size (currently 16 MB), a NULL is returned instead of an error.
‘x’ = It ignores the whitespace.

The match_arg argument allows to specify more than one character. If a character in the argument is not valid, it ignore that character. If match_arg is not specified, is NULL, or is empty:

The match on the input string is case sensitive.
A period does not match the newline character.
The source_string is treated as a single line.

Returns: REGEXP_SUBSTR returns a string value if the given pattern matches with the input string. Otherwise it will return NULL.

Example to extract the substring based on regular expression

In the ecommerce website, customer provide the review on the product which they bought from the website. Lets analyze those customer reviews using REGEXP_SUBSTR function. It is kind of customer sentiment analysis.

Consider that we have a table customer_feed_back as below in Teradata. In that table, we have the customer comment details for the product Graphic Tablet.

The column CUST_COMMENTS have the customer comment details. Lets find the negative comments using the REGEXP_SUBSTR function. To find that, we are using set of negative words with regular expression. These words reflect the quality of the product.

issue
not good
not happy
not user-friendly
low quality
not working
bad

User may give the comments in smaller or upper case letters. To handle that, we are ignoring the case using regular expression pattern. The below regular expression extract the negative comments.

(?i)issue|not\s*good|not\s*happy|not\s*user-friendly|low\s*quality|not\s*working|bad

?i is search the string in case-insensitive mode.
\s* is defines the space between the word
| is denotes OR condition in regex.

Now we are ready to write the select query with the REG_SUBSTR function as below

SELECT 
CUST_ID,
PRODUCT_NAME,
CUST_COMMENTS,
REGEXP_SUBSTR (CUST_COMMENTS ,
'(?i)issue|not\s*good|not\s*happy|not\s*user-friendly|low\s*quality|not\s*working|bad') as Negative_Comments
FROM cust_DB.customer_feed_back;

The select query returns the output as below.

REGEXP_SUBSTR function example in Teradata

The comments are containing negative words as not working and BAD. So the function REGEXP_SUBSTR returns those words for customer id 8048 and 5894. For the not matching comments, it returned the NULL value.

Recommended Articles

Regexp_substr function in Teradata with examples

Regexp_substr function in Teradata

Syntax of Regexp_substr function

Example to extract the substring based on regular expression

Leave a Reply Cancel reply

Tags

Recent Posts