Python Regexes – findall, search, and match
Share
This guide will cover the basics of how to use three common regex functions in Python – findall, search, and match. These three are similar, but they each have different a different purpose. This guide will not cover how to compose a regular expression so it assumes you are already somewhat familiar.
1 – re.match
The match function is used for finding matches at the beginning of a string only.
import re
re.match(r'hello', 'hello world')
# <_sre.SRE_Match at 0x1070055e0>
But keep in mind this only looks for matches at the beginning of the string.
re.match(r'world', 'hello world')
# None
Even if you’re dealing with a multiline string and include a “^” to try to search at the beginning and use the re.MULTILINE flag, it will still only search the beginning of the string.
re.match(r'^hello', 'good morning\nhello world\nhello mom', re.MULTILINE)
# None
A great use case for re.match is testing a single pattern like a phone number or zip code. It’s a good way to tell if your test string matches a desired pattern. This is a quick example of testing to make sure a string matches a desired phone number format.
if re.match(r'(\d{3})-(\d{3})-(\d{4})', '925-783-3005'):
print "phone number is good"
If the string matches, a match object will be returned; otherwise it will return None.
You can read more about Python match objects if necessary.
2 – re.search
This function is very much like match except it looks throughout the entire string and returns the first match. Taking our example from above:
import re
re.search(r'world', 'hello world')
# <_sre.SRE_Match at 0x1070055e0>
When using match this would return None, but using search we get our match object. This function is especially useful for determining if a pattern exists in a string. For instance, you might want to see if a line contains the word sandwich.
line = "I love to each sandwiches for lunch."
if re.search(r'sandwich', line):
# <_sre.SRE_Match at 0x1070055e0>
print "Found a sandwich"
Or maybe you want to take a block of text and find out if any of the lines begin with a number:
text = """
1. ricochet robots
2. settlers of catan
3. acquire
"""
match = re.search(r'\d+\.', text, re.MULTILINE)
match.group()
# '1.'
Again, this is very valuable for searching through an entire block of text to look for a match. If you’re looking to find multiple occurrences of a pattern in a string, you should look at step 3 – findall.
3 – re.findall
Findall does what you would expect – it finds all occurrences of a pattern in a string. This is different from the previous two functions in that it doesn’t return a match object. It simply returns a list of matches.
Using our board game example from above:
text = """
1. ricochet robots
2. settlers of catan
3. acquire
"""
re.findall(r'\d+\.', text, re.MULTILINE)
# ['1.', '2.', '3.']
As you can see, this returns a list of matches. If you don’t use parentheses to capture any groups or if you only capture one group, the result will be a list of strings. If you capture more than one group, the result will be a list of tuples.
text = """
1. ricochet robots
2. settlers of catan
3. acquire
"""
re.findall(r'^(\d+)\.(.*)$', text, re.MULTILINE)
# [('1', ' ricochet robots'), ('2', ' settlers of catan'), ('3', ' acquire')]
In this case, we’re capturing the number and the name of the game in two different groups.