REGEXP_REPLACE function in Teradata
REGEXP_REPLACE is one of the regular expression function in Teradata. It is used to replace the portion of the source string with the another string using the regular expression pattern matching.
Syntax of REGEXP_REPLACE function
- source_string – Input string or column name of the table
- regexp_string – Regular expression pattern to match with source string
- replace_string – Replacement string
position_arg -> Optional.This is the numeric argument. It is the position in the source string from which to start searching (Default = 1)
occurrence_arg -> Optional.It is a numeric argument. It specifies the number of occurrence to be returned. For example, if occurrence_arg is 2, the function matches the first occurrence in source_string and starts searching from the character following the first occurrence in source_string for the second occurrence in source_string.
match_arg -> Optional argument. It specifies the matching behaviour for the REGEXP_REPLACE function. Valid values are listed below
- i’ = case-insensitive matching.
- ‘c’ = case sensitive matching.
- ‘n’ = the period character (match any character) can match the newline character.
- ‘m’ = source string is treated as multiple lines instead of as a single line. With this option, the ‘^’ and ‘$’ characters apply to each line in source string instead of the entire source string.
- ‘l’ = if source string exceeds the current maximum allowed source string size (currently 16 MB), a NULL is returned instead of an error. This is useful for long-running queries where you do not want long strings causing an error that would make the query fail.
- ‘x’ = ignore white space.
Example 1 : Replace all non alphanumeric characters with space
In some cases, input string or column values contains the special characters such as !,$,&,% and so on.But we want to maintain only the alphanumeric(alphabets and numbers) or space values in the string.
Lets see how to replace all those non alphanumeric characters with space using REGEXP_REPLACE function.
REGEXP_REPLACE('Revisit&Class' ,'[^0-9a-zA-Z]' ,' ');
- Revisit&Class is the Source String
- [^ ] is used to specify a non matching list where you are trying to match any character except for ones in the list.
- 0-9 is mentioning the numbers from 0 to 9.
- a-z denotes the small case alphabets from a,b,c.. to z.
- A-Z means the upper case alphabets from A,B,C,. to Z.
- ‘ ‘ (space) is mentioned in the Replacement string.
The input string contains the special character ampersand(&) and it is not a alphanumeric character. So the given regular expression pattern [^0-9a-zA-Z] is matched with that character and it is replaced with space that is given in the replacement string.