Python Regular Expression

aimtocode

Reg Expression Definition

Reg. Expression is a string pattern written in a compact syntax, that allows us to quickly check whether a given string matches or contains a given pattern.

It is widely used in natural language processing, web applications that require validating string input (like email address) and pretty much most data science projects that involve text mining.

A regex pattern is a special language used to represent generic text, numbers or symbols so it can be used to extract texts that conform to that pattern.

Regular expressions can be concatenated to form new regular expressions; if A and B are both regular expressions, then AB is also a regular expression.

The re module must be imported to use the regex functionalities in python.

 import re

Regular Expression Functions

Regular expression has various different functions used in python which are as follows.


SN Function Description
1 match This method matches the regex pattern in the string with the optional flag. It returns true if a match is found in the string otherwise it returns false.
2 search This method returns the match object if there is a match found in the string.
3 findall It returns a list that contains all the matches of a pattern in the string.
4 split Returns a list in which the string has been split in each match.
5 sub Replace one or many matches in the string.

How to use regular expression?

Regular expression can only be used by using the mix of meta-characters, special sequences, and sets.

As the name suggests, these characters have a special meaning, similar to * in wild card.

There are different meta characters which is given below:

Meta-Characters


Metacharacter Description Example
[] It represents the set of characters. "[a-z]"
\ It represents the special sequence. "\r"
. It signals that any character is present at some specific place. "pyt.hon."
^ It represents the pattern present at the beginning of the string. "^aimtocode"
$ It represents the pattern present at the end of the string. "tutorial"
* It represents zero or more occurrences of a pattern in the string. "pyt*"
+ It represents one or more occurrences of a pattern in the string. "python+"
{} The specified number of occurrences of a pattern the string. "aim{2}"
| It represents either this or that character is present. "to|code"
() Capture and group


The most commonly used operators?

Here are the most commonly used operators that helps to generate an expression to represent required characters in a string or file.

It is commonly used in web scrapping and text mining to extract required information.

Operators Description
. Matches with any single character except newline ‘\n’.
? match 0 or 1 occurrence of the pattern to its left
+ 1 or more occurrences of the pattern to its left
* 0 or more occurrences of the pattern to its left
\w Matches with a alphanumeric character whereas \W (upper case W) matches non alphanumeric character.
\d Matches with digits [0-9] and /D (upper case D) matches with non-digits.
\s Matches with a single white space character (space, newline, return, tab, form) and \S (upper case S) matches any non-white space character.
\b boundary between word and non-word and /B is opposite of /b
[..] Matches any single character in a square bracket and [^..] matches any single character not in square bracket
\ It is used for special meaning characters like \. to match a period or \+ for plus sign.
^ and $ ^ and $ match the start or end of the string respectively
{n,m} Matches at least n and at most m occurrences of preceding expression if we write it as {,m} then it will return at least any minimum occurrence to max m preceding expression.
a| b Matches either a or b
( ) Groups regular expressions and returns matched text
\t, \n, \r Matches tab, newline, return

The match object

The match function is used to match the RE pattern to string with optional flags.

In this method, the expression "w+" and "\W" will match the words starting with letter 'g' and thereafter, anything which is not started with 'g' is not identified.


Example:



Output:

 Aimtocode NOT FOUND
 Python FOUND
 Java NOT FOUND
 C NOT FOUND
 C++ NOT FOUND
 Php FOUND

Example 2:



Output:

 <pre class="pre-style">
 <class 're.Match'>
 <re.Match object; span=(17, 26), match='Aimtocode'>	

Example 3: re.findall for text




Output:

 Looking for "Aimtocode" in "Learning Python at Aimtocode" -> found a match!
 Looking for "Python" in "Learning Python at Aimtocode" -> found a match!
 no match
 support@aimtocode.com
 admin@aimtocode.com
 vermaprayag@aimtocode.com
aimtocode