A generator function looks very similar to a regular function, but there's one major difference: yield. When you include the yield keyword in a function, the function automatically becomes a generator function.
This means that when we call the generator function, the first thing it does is return a generator object without beginning execution at all. When we call next, only then does it begin execution, and it executes until it reaches a yield statement.
Here's a basic, useless example:
def useless():
yield "King Arthur"
yield "Brave Sir Robin"
yield "Sir Galahad the Chaste"
u = useless()
print next(u)
# "King Arthur"
print next(u)
# "Brave Sir Robin"
print next(u)
# "Sir Galahad the Chaste"
You can see here that whenever you call next on the generator object, it continues executing until it reaches another yield statement. So what happens if you call next again?
Well, it raises a StopIteration exception.
$ python useless.py
King Arthur
Brave Sir Robin
Sir Galahad the Chaste
Traceback (most recent call last):
File "useless.py", line 10, in <module>
print next(u)
StopIteration
What is interesting about the generator function is that even though control is passed back to the caller, its state is frozen. Calling next simply resumes execution until it reaches another yield statement.
The value of using a generator for our purpose is clear. We can now write a generator function that yields one match at a time rather than loading up all of the matches in memory.
def find_matches(filenames, pattern):
for fname in filenames:
for line in open(fname):
if pattern in line:
yield line
We can call it the same way and get the same apparent results.
files = ['t1.txt', 't2.txt', 't3.txt']
for match in find_matches(files, 'the'):
print match
The difference is our generator can handle extremely large files and many of them. Without a generator this would be extremely messy.