Python Regexes - match objects

In Python, regular expression matches can be returned in the form of a match object. In this guide, I'll cover the basics of how to make use of a match object.

1

A group is a pattern that you want to capture in a string. Let's use re.match to capture the first and last name in a string.

import re
m = re.match(r"(\w+) (\w+)", "Adam Smith")

m is a match object, and this object gives us access to a method called group.

m.group()
# 'Adam Smith'

Calling m.group() will return the entire matched pattern. Calling it this way is the same as calling m.group(0). But if you want to specify a single group you can do so. We know that we are expecting two separate groups so we can call m.group(1) and m.group(2).

m.group(1)
# 'Adam'
m.group(2)
# 'Smith'

Using the (?P<name>) syntax, you can even access the group by name:

import re
m = re.match(r"(?P<first>\w+) (?P<last>\w+)", "Adam Smith")
m.group('first')
# 'Adam'
m.group('last')
# 'Smith'
2

Using groups we can return a tuple of the captures groups. Suppose we wanted to parse a telephone number.

import re
m = re.match(r"(\d{3})[.|\-]?(\d{3})[.|\-]?(\d{4})", "925.783.3005")
m.groups()
# ('925', '783', '3005')

A neat thing about the groups method is that you can pass in a default value. Let's make the area code optional. If not provided, the default value will be None.

import re
m = re.match(r"(\d{3})?[.|\-]?(\d{3})[.|\-]?(\d{4})", "783.3005")
m.groups()
# (None, '783', '3005')

Let's pass in a default value now:

import re
m = re.match(r"(\d{3})?[.|\-]?(\d{3})[.|\-]?(\d{4})", "783.3005")
m.groups('xxx')
# ('xxx', '783', '3005')
3

Now let's look at groupdict. This returns our groups in the form of a dictionary.

import re
m = re.match(r"(?P<first>\w+) (?P<last>\w+)", "Adam Smith")
m.groupdict()
# {'first': 'Adam', 'last': 'Smith'}

Like groups you can pass in a default value for groups that did not participate in the match. Make the space and the last name optional, and you'll see the default value take effect:

import re
m = re.match(r"(?P<first>\w+) ?(?P<last>\w+)?", "Adam")
m.groupdict(False)
# {'first': 'Adam', 'last': False}