A look at the batteries 2: Python bytecode

I recently read an article about Python bytecode. The article explains that the CPython interpreter (the standard implementation of Python that is written in C) compiles your Python sourcecode into bytecode which is then executed in a virtual machine. It was actually quite common to observe this in Python 2 as CPython would create a .pyc file next to every .py file to store the compiled bytcode. In Python 3, these compiled files are now stored in a subdirectory called .__pycache__ but the content remains the same.

As this bytecode is an optimization for the virtual machine, we can’t open it with a text editor anymore to interpret it. Luckily though, Python comes with a standard library module called dis. The dis module holds a function with which we can take a look at the bytecode version of a compiled function like this: dis.dis(function_name).

The original articles uses the following code snippet as an example:

def hello():
    print("Hello, World!")

We can follow along by using a REPL to define and inspect the function:

Python 3.7.3 (default, Mar 27 2019, 09:23:15)
[Clang 10.0.1 (clang-1001.0.46.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> def hello():
...     print('Hello World!')
...
>>> dis.dis(hello)
  2           0 LOAD_GLOBAL              0 (print)
              2 LOAD_CONST               1 ('Hello World!')
              4 CALL_FUNCTION            1
              6 POP_TOP
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE

As we can see in the disassembled output, CPython loads the global print function, loads the constant value Hello world and calls the function. It also pops and item of the evaluation stack and returns None but this is not our primary interest right now. What is interesting, is taking a closer look at the way the print function works. That is: the fact that it even is a function.
Up until Python 3, print used to not be a function but a statement. We can actually observe this, if we run the same code in a Python 2 REPL:

Python 2.7.16 (default, Apr 12 2019, 15:32:40)
[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.46.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> def hello(): print('Hello World!')
...
>>> hello()
Hello World!
>>> dis.dis(hello)
  1           0 LOAD_CONST               1 ('Hello World!')
              3 PRINT_ITEM
              4 PRINT_NEWLINE
              5 LOAD_CONST               0 (None)
              8 RETURN_VALUE

As you can see here, no print function needs to be loaded. Instead there are specialised PRINT_ITEM and PRINT_NEWLINE instructions that get executed here. Since we don’t have a function call, we also don’t need to pop anything of the evaulation stack but continue returning None as before.

We can also see how this behaviour can change in Python 2 if we explicitly import print_function from __future__. This is the previous Python 2 REPL continued:

>>> from __future__ import print_function
>>> def hi():
...     print('Hi World!')
...
>>> dis.dis(hi)
  2           0 LOAD_GLOBAL              0 (print)
              3 LOAD_CONST               1 ('Hi World!')
              6 CALL_FUNCTION            1
              9 POP_TOP
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE

As you can see, the behaviour is now exactly the same as in the Python3 example, no more PRINT_ instructions.

I thought this was pretty cool to see. I don’t have a specific use case yet, why I would ever want to look at Python bytecode but I find it intersting to be able to understand one level deeper and think that I will probably see a use for this at some point. Do you have a usecase for this? Let me know on twitter.