Do you have a column where you need to extract a word from a string of letters? Perhaps there is a combination of numbers, characters and letters in the column. REGEX stands for Regular Expression. It is a powerful text searching function. These expressions are used in various languages. I use it in Alteryx and also Tableau!
There are a couple of websites that can help you with your regular expression syntax;
https://regex101.com/ - you enter your expression and it will provide you suggestions
https://regexlearn.com/learn/regex101 - describes REGEX in detail and points you to other sites to help you get started
Let me give you a use case. Sometimes I need to pull a column with a name value from a SharePoint site. When I pull in the name, there are numbers before my name, along with some characters.
Example
1234;# Harrington, Dawn
How do I get just my name from this column?
I would create a calculated field with the formula below;
REGEXP_REPLACE([Name],'([0-9#;\s])','')
My results will appear as below with only my name!
Anything in between the '([])' of the formula above are the characters we do not want to have in our final product. In this case I wanted to remove any numbers expressed as 0-9. I also wanted to remove the semi-colon and the pound sign.
Then I wanted to remove the space after the pound sign and before the H of my last name. The \s is for this space. The single quotes at the end is what I would specify replace this values. I entered nothing since I didn't want to replace those values.
There is another way of extracting the name with the REGEX_EXTRACT formula.
REGEXP_EXTRACT([Name],'([a-zA-Z]+,[a-zA-Z]+)')
This time anything between the '()' is extracting the letters, comma and then the closing letters. [a-zA-Z] is any upper or lower case letters. The plus sign after this signifies that there are one or more of these letters. Without the plus sign, you will only retrieve one letter.
I strongly encourage you to try out your own letter and number combinations to help you learn these formulas!
Comments