As you might know, Python’s slogan is batteries included, which means that many of the common tasks that one would want to do are possible with the standard library. Since Python is open source, I thought that it might be fun to take a look at these batteries to see how Python implements these convenient functions and classes.

One of the things that I find quite useful in the standard library is the Counter class from the collections module. As the name might have given away already, it allows you to easily count things.

Say you want to count count how often a specific letter appears in a word. You could write some code like this:

word = "montypython"
counter = {}
for letter in word:
  if letter not in counter:
    counter[letter] = 0
  counter[letter] += 1
print(counter)

Output:

{'m': 1, 'o': 2, 'n': 2, 't': 2, 'y': 2, 'p': 1, 'h': 1}

Or - following the Easier to ask forgiveness than permission motto - one could write the code like this:

word = "montypython"
counter = {}
for letter in word:
  try:
    counter[letter] += 1
  except KeyError:
    counter[letter] = 1
print(counter)

But since this is quite a common task, Python provides a class for this that you can use. When using the Counter class, your code could simply look like this:

from collections import Counter

word = "montypython"
counter = Counter()
for letter in word:
  counter[letter] += 1
print(counter)

Strictly speaking, you could also just write Counter(word) to get to the same result, but I want to look at the manual way today. The question is: Why does this work? Shouldn’t it also raise a KeyError since the keys that we are trying to increment are not in counter yet?

To understand what is happening here, we should take a look at the source code of Counter. 1

All we need is the following lines:

class Counter(dict):
    def __missing__(self, key):
        'The count of elements not in the Counter is zero.'
        # Needed so that self[missing_item] does not raise KeyError
    return 0

In the first line, we can see that Counter is actually a subclass of dict. dict is Python’s dictionary type which is implemented in C.

The actual implementation of Counter has many more functions than just __missing__, but it is this one that is interesting to us. When we try to access the item of a dict (e.g. by saying give me the value for a in the dict counter: counter['a']), the C code makes a call to the dict_subscript function. It looks (in part) like this:

    ix = (mp->ma_keys->dk_lookup)(mp, key, hash, &value);
    if (ix == DKIX_ERROR)
        return NULL;
    if (ix == DKIX_EMPTY || value == NULL) {
        if (!PyDict_CheckExact(mp)) {
            /* Look up __missing__ method if we're a subclass. */
            PyObject *missing, *res;
            _Py_IDENTIFIER(__missing__);
            missing = _PyObject_LookupSpecial((PyObject *)mp, &PyId___missing__);
            if (missing != NULL) {
                res = PyObject_CallFunctionObjArgs(missing,
                                                   key, NULL);
                Py_DECREF(missing);
                return res;
            }
            else if (PyErr_Occurred())
                return NULL;
        }

The function tries to get the value for the given key. If it can find a value, it returns it (further down) but if it cannot find a value (ix == DKIX_EMPTY || value == NULL), it makes a check to see if the current object is a subclass of dict (!PyDict_CheckExact(mp)). If that is the case the function looks up the special __missing__ function in this object and calls it to get the value to be returned instead. As we can see above the Counter class actually implements the __missing__ function. It always returns 0. So in our above example what happens is this:

In our example above, we iterate through "montypython". In the first iteration letter is m. Using += is syntactic sugar around __getitem__ and __setitem__. Meaning that the call counter['m'] += 1 is (almost) shorthand for:

_count = counter.__getitem__('m')
_count += 1
counter.__setitem__('m', _count)

When this code is executed, __getitem__ will make a call internally to dict_subscript, which will return DKIX_EMPTY because no value for m is stored yet. Because it returns DKIX_EMPTY, it will check if a __missing__ functions is defined instead and if so it will use it’s return value. It will find that we have a __missing__ function defined on the Counter object and use the return value of 0. This means that the second line will look like _count = 0 + 1 and the third line will now actually add the value to the dictionary by calling counter.__setitem('m', 1). That is all the magic needed.

So all in all, the convenient Counter module (which also offers a lot of other features like creating a counter directly from an iterable, adding and substracting counters, getting most common values etc.) is not magic at all. Most of the code is written in Python and the parts that interface with the C parts are not that difficult to understand 🚀.

If you found any mistakes or inconsistencies in this or have an idea for the next module from the python standard library that we should take a look at feel free to reach out to me via email or twitter

PS: Yes, for what I was showing here I could have just used a defaultdict(int) but as far as I can tell that is completely implemented in C so I would not have been able to show this C/Python intersection so nicely :)

PPS: Thanks a lot to Lisa for proofreading this!


1. Note that I'm not a python core developer though so the following should probably be taken with a grain of salt.