Python regexes - match objects

In Python, regular expression matches can be returned in the form of a match object. In this guide, I'll cover the basics of how to make use of a match object.

1

A group is a pattern that you want to capture in a string. Let's use re.match to capture the first and last name in a string.

import re
m = re.match(r"(\w+) (\w+)", "Adam Smith")
m is a match object, and this object gives us access to a method called group.
m.group()

'Adam Smith'

Calling m.group() will return the entire matched pattern. Calling it this way is the same as calling m.group(0). But if you want to specify a single group you can do so. We know that we are expecting two separate groups so we can call m.group(1) and m.group(2).

m.group(1)

'Adam'

m.group(2)

'Smith'

Using the (?P) syntax, you can even access the group by name:

import re
m = re.match(r"(?P<first>\w+) (?P<last>\w+)", "Adam Smith")
m.group('first')

'Adam'

m.group('last')

'Smith'

2

Using groups we can return a tuple of the captures groups. Suppose we wanted to parse a telephone number.

import re
m = re.match(r"(\d{3})[.|-]?(\d{3})[.|-]?(\d{4})", "925.783.3005")
m.groups()

('925', '783', '3005')

A neat thing about the groups method is that you can pass in a default value. Let's make the area code optional. If not provided, the default value will be None.

import re
m = re.match(r"(\d{3})?[.|-]?(\d{3})[.|-]?(\d{4})", "783.3005")
m.groups()

(None, '783', '3005')

Let's pass in a default value now:

import re
m = re.match(r"(\d{3})?[.|-]?(\d{3})[.|-]?(\d{4})", "783.3005")
m.groups('xxx')

('xxx', '783', '3005')

3

Now let's look at groupdict. This returns our groups in the form of a dictionary.

import re
m = re.match(r"(?P<first>\w+) (?P<last>\w+)", "Adam Smith")
m.groupdict()

{'first': 'Adam', 'last': 'Smith'}

Like groups you can pass in a default value for groups that did not participate in the match. Make the space and the last name optional, and you'll see the default value take effect:

import re
m = re.match(r"(?P<first>\w+) ?(?P<last>\w+)?", "Adam")
m.groupdict(False)

{'first': 'Adam', 'last': False}