I recently read an article about Python
bytecode. The article explains
that the CPython interpreter (the standard implementation of Python that is written in C) compiles
your Python sourcecode into bytecode which is then executed in a virtual machine. It was actually
quite common to observe this in Python 2 as CPython would create a .pyc
file next to every .py
file to store the compiled bytcode. In Python 3, these compiled files are now stored in a
subdirectory called .__pycache__
but the content remains the same.
As this bytecode is an optimization for the virtual machine, we can’t open it with a text editor
anymore to interpret it. Luckily though, Python comes with a standard library module called
dis
. The dis
module holds a function with which we
can take a look at the bytecode version of a compiled function like this: dis.dis(function_name)
.
The original articles uses the following code snippet as an example:
def hello():
print("Hello, World!")
We can follow along by using a REPL to define and inspect the function:
Python 3.7.3 (default, Mar 27 2019, 09:23:15)
[Clang 10.0.1 (clang-1001.0.46.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> def hello():
... print('Hello World!')
...
>>> dis.dis(hello)
2 0 LOAD_GLOBAL 0 (print)
2 LOAD_CONST 1 ('Hello World!')
4 CALL_FUNCTION 1
6 POP_TOP
8 LOAD_CONST 0 (None)
10 RETURN_VALUE
As we can see in the disassembled output, CPython loads the global print
function, loads the
constant value Hello world
and calls the function. It also pops and item of the evaluation stack
and returns None
but this is not our primary interest right now. What is interesting, is taking a
closer look at the way the print function works. That is: the fact that it even is a function.
Up until Python 3, print
used to not be a function but a statement. We can actually observe this, if
we run the same code in a Python 2 REPL:
Python 2.7.16 (default, Apr 12 2019, 15:32:40)
[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.46.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> def hello(): print('Hello World!')
...
>>> hello()
Hello World!
>>> dis.dis(hello)
1 0 LOAD_CONST 1 ('Hello World!')
3 PRINT_ITEM
4 PRINT_NEWLINE
5 LOAD_CONST 0 (None)
8 RETURN_VALUE
As you can see here, no print
function needs to be loaded. Instead there are specialised
PRINT_ITEM
and PRINT_NEWLINE
instructions that get executed here. Since we don’t have a function
call, we also don’t need to pop anything of the evaulation stack but continue returning None
as
before.
We can also see how this behaviour can change in Python 2 if we explicitly import print_function
from __future__
. This is the previous Python 2 REPL continued:
>>> from __future__ import print_function
>>> def hi():
... print('Hi World!')
...
>>> dis.dis(hi)
2 0 LOAD_GLOBAL 0 (print)
3 LOAD_CONST 1 ('Hi World!')
6 CALL_FUNCTION 1
9 POP_TOP
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
As you can see, the behaviour is now exactly the same as in the Python3 example, no more PRINT_
instructions.
I thought this was pretty cool to see. I don’t have a specific use case yet, why I would ever want to look at Python bytecode but I find it intersting to be able to understand one level deeper and think that I will probably see a use for this at some point. Do you have a usecase for this? Let me know on twitter.