Knowledge for the World

Python regexes - findall, search, and match

This guide will cover the basics of how to use three common regex functions in Python - findall, search, and match. These three are similar, but they each have different a different purpose. This guide will not cover how to compose a regular expression so it assumes you are already somewhat familiar.

1

The match function is used for finding matches at the beginning of a string only.

import re
re.match(r'hello', 'hello world')
# <_sre.SRE_Match at 0x1070055e0>

But keep in mind this only looks for matches at the beginning of the string.

re.match(r'world', 'hello world')
# None

Even if you're dealing with a multiline string and include a "^" to try to search at the beginning and use the re.MULTILINE flag, it will still only search the beginning of the string.

re.match(r'^hello', 'good morning\nhello world\nhello mom', re.MULTILINE)
# None

A great use case for re.match is testing a single pattern like a phone number or zip code. It's a good way to tell if your test string matches a desired pattern. This is a quick example of testing to make sure a string matches a desired phone number format.

if re.match(r'(\d{3})-(\d{3})-(\d{4})', '925-783-3005'):
    print "phone number is good"

If the string matches, a match object will be returned; otherwise it will return None.

You can read more about Python match objects if necessary.

2

This function is very much like match except it looks throughout the entire string and returns the first match. Taking our example from above:

import re
re.search(r'world', 'hello world')
# <_sre.SRE_Match at 0x1070055e0>

When using match this would return None, but using search we get our match object.

This function is especially useful for determining if a pattern exists in a string. For instance, you might want to see if a line contains the word sandwich.

line = "I love to each sandwiches for lunch."
if re.search(r'sandwich', line):
    # <_sre.SRE_Match at 0x1070055e0>
    print "Found a sandwich"

Or maybe you want to take a block of text and find out if any of the lines begin with a number:

text = """
1. ricochet robots
2. settlers of catan
3. acquire
"""
match = re.search(r'\d+\.', text, re.MULTILINE)
match.group()
# '1.'

Again, this is very valuable for searching through an entire block of text to look for a match. If you're looking to find multiple occurrences of a pattern in a string, you should look at step 3 - findall.

3

Findall does what you would expect - it finds all occurrences of a pattern in a string. This is different from the previous two functions in that it doesn't return a match object. It simply returns a list of matches.

Using our board game example from above:

text = """
1. ricochet robots
2. settlers of catan
3. acquire
"""
re.findall(r'\d+\.', text, re.MULTILINE)
# ['1.', '2.', '3.']

As you can see, this returns a list of matches. If you don't use parentheses to capture any groups or if you only capture one group, the result will be a list of strings. If you capture more than one group, the result will be a list of tuples.

text = """
1. ricochet robots
2. settlers of catan
3. acquire
"""
re.findall(r'^(\d+)\.(.*)$', text, re.MULTILINE)
# [('1', ' ricochet robots'), ('2', ' settlers of catan'), ('3', ' acquire')]

In this case we're capturing the number and the name of the game in two different groups.