Python Regexes - How to Match Objects in Python
In Python, regular expression matches can be returned in the form of a match object. In this guide, I'll cover the basics of how to make use of a match object.
In Python, regular expression matches can be returned in the form of a match object. In this guide, I'll cover the basics of how to make use of a match object.
A group is a pattern that you want to capture in a string. Let's use re.match to capture the first and last name in a string.
import re
m = re.match(r"(\w+) (\w+)", "Adam Smith")
m is a match object, and this object gives us access to a method called group.
m.group()
# 'Adam Smith'
Calling m.group() will return the entire matched pattern. Calling it this way is the same as calling m.group(0). But if you want to specify a single group you can do so. We know that we are expecting two separate groups so we can call m.group(1) and m.group(2).
m.group(1)
# 'Adam'
m.group(2)
# 'Smith'
Using the (?P<name>)
syntax, you can even access the group by name:
import re
m = re.match(r"(?P<first>\w+) (?P<last>\w+)", "Adam Smith")
m.group('first')
# 'Adam'
m.group('last')
# 'Smith'
Using groups we can return a tuple of the captures groups. Suppose we wanted to parse a telephone number.
import re
m = re.match(r"(\d{3})[.|\-]?(\d{3})[.|\-]?(\d{4})", "925.783.3005")
m.groups()
# ('925', '783', '3005')
A neat thing about the groups method is that you can pass in a default value. Let's make the area code optional. If not provided, the default value will be None.
import re
m = re.match(r"(\d{3})?[.|\-]?(\d{3})[.|\-]?(\d{4})", "783.3005")
m.groups()
# (None, '783', '3005')
Let's pass in a default value now:
import re
m = re.match(r"(\d{3})?[.|\-]?(\d{3})[.|\-]?(\d{4})", "783.3005")
m.groups('xxx')
# ('xxx', '783', '3005')
Now let's look at groupdict. This returns our groups in the form of a dictionary.
import re
m = re.match(r"(?P<first>\w+) (?P<last>\w+)", "Adam Smith")
m.groupdict()
# {'first': 'Adam', 'last': 'Smith'}
Like groups you can pass in a default value for groups that did not participate in the match. Make the space and the last name optional, and you'll see the default value take effect:
import re
m = re.match(r"(?P<first>\w+) ?(?P<last>\w+)?", "Adam")
m.groupdict(False)
# {'first': 'Adam', 'last': False}
Generators in Python are incredibly powerful yet often hard to grasp for beginners. In this guide we'll cover generators in depth.