Python Collections: Why and When to Use Collections (With Examples)

John John (304)
10 minutes

Much of what you need to do with Python can be done using built-in containers like dict, list, set, and tuple. But these aren't always the most optimal. In this guide, I'll cover why and when to use collections and provide interesting examples of each. This is designed to supplement the documentation with examples and explanation, not replace it.

Posted in these interests:
h/python67 guides
h/code69 guides
from collections import Counter

A counter is a dictionary-like object designed to keep tallies. With a counter, the key is the item to be counted and value is the count. You could certainly use a regular dictionary to keep a count, but a counter provides much more control.

A counter object ends up looking just like a dictionary and even contains a dictionary interface.

ctr = Counter({'birds': 200, 'lizards': 340, 'hamsters': 120})
ctr['hamsters'] # 120

One thing to note is that if you try to access a key that doesn't exist, the counter will return 0 rather than raising a KeyError as a standard dictionary would.

Counters come with a brilliant set of methods that will make your life easier if you learn how to use them.

Get the most common word in a text file

import re
words = re.findall(r'\w+', open('ipencil.txt').read().lower())
Counter(words).most_common(1) # [('the', 148)]

Get the count of each number in a long string of numbers

numbers = """
numbers = re.sub("\n", "", numbers)
[('2', 112),
 ('5', 107),
 ('4', 107),
 ('6', 103),
 ('9', 100),
 ('8', 100),
 ('1', 99),
 ('0', 97),
 ('7', 91),
 ('3', 84)]

most_common is a very valuable method. If you pass in an integer as the first parameter, it will return that many results. If you call it without any arguments, it will return the frequency of all elements. As you can see it returns a list of tuples - the tuple structured like this (value, frequency).

When dealing with multiple Counter objects you can perform operations against them. For instance, you can add two counters which would add the counts for each key. You can also perform intersection or union. If I wanted to compare the values for given keys between two counters, I can return the minimum or maximum values only.

For example, a student has taken 4 quizzes two times each. She is allowed to keep the highest score for each quiz.

first_attempt = Counter({1: 90, 2: 65, 3: 78, 4: 88})
second_attempt = Counter({1: 88, 2: 84, 3: 95, 4: 92})
final = first_attempt | second_attempt
final # Counter({3: 95, 4: 92, 1: 90, 2: 84})
from collections import deque

deque stands for "double-ended queue" and is used as a stack or queue. Although lists offer many of the same operations, they are not optimized for variable-length operations.

How do you know when to use a deque verses a list?

Basically if you're structuring the data in a way that requires quickly appending to either end or retrieving from either end then you would want to use a deque. For instance, if you're creating a queue of objects that need to be processed and you want to process them in the order they arrived, you would want to append new objects to one end and pop objects off of the other end for processing.

queue = deque()
# append values to wait for processing
# pop values when ready
process(queue.pop()) # would process "first"
# add values while processing
# what does the queue look like now?
queue # deque(['fourth', 'third', 'second'])

As you can see we're adding items to the left and popping them from the right. Deque provides four commonly used methods for appending and popping from either side of the queue: append, appendleft, pop, and popleft.

In the above example we started with an empty deque, but we can also create a deque from another iterable.

>>> numbers = [0, 1, 2, 3, 5, 7, 11, 13]
>>> queue = deque(numbers)
>>> print queue
deque([0, 1, 2, 3, 5, 7, 11, 13])

Or how about from a range:

>>> queue = deque(range(0, 10))
>>> print queue
deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
from collections import defaultdict

Suppose you have a sequence of key-value pairs. Perhaps you are keeping track of how many miles you run each day, and you want to know which day of the week you are most active.

days = [('monday', 2.5), ('wednesday', 2), ('friday', 1.5), ('monday', 3), ('tuesday', 3.5), ('thursday', 2), ('friday', 2.5)]
active_days = defaultdict(float)
for k, v in days:
    active_days[k] += v
# defaultdict(<type 'float'>, {'tuesday': 3.5, 'friday': 4.0, 'thursday': 2.0, 'wednesday': 2.0, 'monday': 5.5})

This can be accomplished using many other data types, but defaultdict allows us to specify the default type of the value. This is simpler and faster than using a regular dict with dict.setdefault.

You pass in the default type upon instantiation. Then you can immediately begin setting values even if the key is not yet set. This would obviously throw a KeyError if you tried this with a normal dictionary.

Here is an example using a list as the default value. Here we have a list of sets. Each set has a letter and a number, and the letters are both uppercase and lowercase. Suppose we want to make a list of values grouped by letter ignoring case.

letters = [('A', 10), ('B', 3), ('C', 4), ('a', 36), ('b', 8), ('c', 10)]
grouped_letters = defaultdict(list)
for k, v in letters:
# defaultdict(<type 'list'>, {'a': [10, 36], 'c': [4, 10], 'b': [3, 8]})
from collections import namedtuple

A namedtuple is a ... named tuple. When you use a standard tuple it's difficult to convey the meaning of each position of the tuple. A named tuple is just like a normal tuple, but it allows you to give names to each position making the code more readable and self-documenting. Also with a namedtuple you can access the positions by name as well as index.

To instantiate we pass in the name of the type we want to create. Then we pass in a list of field names.

coordinate = namedtuple('Coordinate', ['x', 'y'])

Now when we want to use our named tuple, coordinate, we can use it like a tuple.

c = coordinate(10, 20)

Or we can instantiate by name:

c = coordinate(x=10, y=20)

And just like a normal tuple we can still access by index and unpack, but our namedtuple allows to access values to name as well.

>>> x, y = c
>>> x, y
(10, 20)
>>> c.x
>>> c.y
>>> c[0]
>>> c[1]

A great example comes straight from the documentation. If we want to grab data from a csv and provide useful names for the positions rather than just indices, we can use a named tuple:

User = namedtuple('User', 'name, email, username, staff')
import csv
for user in map(User._make, csv.reader(open("users.csv", "rb"))):
    print, user.title

In the above example, we're using the _make method which accepts an iterable and produces the namedtuple based on those values.

Using our coordinate example, we can create a coordinate from a list using _make.

>>> c = [30, 45]
>>> coordinate._make(c)
coordinate(x=30, y=45)

You can convert a dictionary to a namedtuple using the double-start-operator.

>>> c = {'x': 30, 'y': 45}
>>> coordinate(**c)
coordinate(x=30, y=45)
from collections import OrderedDict

OrderedDicts act just like regular dictionaries except they remember the order that items were added. This matters primarily when you are iterating over the OrderedDict as the order will reflect the order in which the keys were added.

A regular dictionary doesn't care about order:

d = {}
d['a'] = 1
d['b'] = 10
d['c'] = 8
for letter in d:
    print letter
# a
# c
# b

You can imagine what an OrderedDict would do:

d = OrderedDict()
d['a'] = 1
d['b'] = 10
d['c'] = 8
for letter in d:
    print letter
# a
# b
# c

It simply maintains the order. As a subclass of dict, OrderedDict has all of the same methods. Being that it cares about order, there are a few added methods. OrderedDict.popitem pops the most recently added element (LIFO), unless last=False is specified in which case it takes the first element added (FIFO).

d = OrderedDict()
d['a'] = 1
d['b'] = 10
d['c'] = 8
# ('c', 8)
# OrderedDict([('a', 1), ('b', 10)])
# ('a', 1)
# OrderedDict([('b', 10)])

Since order matters in iteration, you can iterate over an OrderedDict backwards using reverse().

d = OrderedDict()
d['a'] = 1
d['b'] = 10
d['c'] = 8
for letter in reversed(d):
    print letter
# c
# b
# a

Check out the Python documentation for collections here.

John John (304)

There are two very similar statements in JavaScript: and for...of. And while they can be easily confused, they're actually quite different.