I wanted to share some pretty cool stuff I’ve been learning about Python bytecode with you, including how I added support for nested functions, but my guy at the printing press said I needed to keep it under 500 words.
It’s a holiday week, he shrugged. What do you expect me to do?
Excluding code snippets, I bargained.
Fine, he ceded.
Do you know why we use bytecode in the first place?
I just operate the printing press, I trust you though.
Fair enough. Let’s begin.
Memphis, my Python interpreter written in Rust, has two execution engines. Neither can run all code but both can run some code.
My treewalk interpreter is what you would build if you didn’t know what you were doing. ?♂️ You tokenize the input Python code, generate an abstract syntax tree (AST), and then walk the tree and evaluate each node. Expressions return values and statements modify the symbol table, which is implemented as a series of scopes which respect Python scoping rules. Just remember the easy pneumonic LEGB: local, enclosing, global, builtin.
My bytecode VM is what you would build if you didn’t know what you were doing but wanted to act like you did. Also ?♂️. For this engine, the tokens and AST work the same, but rather than walking we take off sprinting. We compile the AST into an intermediate representation (IR) hereafter known as bytecode. We then create a stack-based virtual machine (VM), which conceptually acts like a CPU, executing bytecode instructions in sequence, but it’s implemented entirely in software.
(For a complete guide of both approaches without the ramblings, Crafting Interpreters is excellent.)
Why do we do this in the first place? Just remember the two Ps: portability and performance. Remember how in the early 2000s nobody would shut up about how Java bytecode was portable? All you need is a JVM and you can run a Java program compiled on any machine! Python chose not to go with this approach for both technical and marketing reasons, but in theory the same principles apply. (In practice, the compilation steps are different and I regret opening this can of worms.)
Performance is the biggie though. Rather than traversing an AST multiple times during the lifetime of a program, the compiled IR is a more efficient representation. We see improved performance from avoiding the overhead of repeatedly traversing an AST, and its flat structure often results in better branch prediction and cache locality at runtime.
(I don’t blame you for not thinking about caching if you don’t have a background in computer architecture—heck, I began my career in that industry and I think about caching far less than I think about how to avoid writing the same line of code twice. So just trust me on the performance piece. That’s my leadership style: blind trust.)
Hey buddy, that’s 500 words. We need to load up the frame and let ‘er rip.
Already?! You excluded code snippets?
There are no code snippets, my man.
Okay okay. Just 500 more. I promise.
I got kinda far before tabling my bytecode VM implementation about a year ago: I could define Python functions and classes and call those functions and instantiate those classes. I clamped down this behavior with some tests. But I knew my implementation was messy and that I’d need to revisit the fundamentals before adding more fun stuff. Now it’s Christmas week and I want to add fun stuff.
Consider this snippet for calling a function, keeping an eye on the TODO.
fn compile_function_call( &mut self, name: &str, args: &ParsedArguments) ) -> Result<Bytecode, CompileError> { let mut opcodes = vec![]; // We push the args onto the stack in reverse call order so that we will pop // them off in call order. for arg in args.args.iter().rev() { opcodes.extend(self.compile_expr(arg)?); } let (_, index) = self.get_local_index(name); // TODO how does this know if it is a global or local index? this may not be the right // approach for calling a function opcodes.push(Opcode::Call(index)); Ok(opcodes) }
Are you done considering? We load the function arguments onto the stack and “call the function”. In bytecode, all names are converted into indices (because index access is faster during the VM runtime), but we don’t really have a way to know whether we are dealing with a local index or a global index here.
Now consider the improved version.
fn compile_function_call( &mut self, name: &str, args: &ParsedArguments) ) -> Result<Bytecode, CompileError> { let mut opcodes = vec![self.compile_load(name)]; // We push the args onto the stack in reverse call order so that we will pop // them off in call order. for arg in args.args.iter().rev() { opcodes.extend(self.compile_expr(arg)?); } let argc = opcodes.len() - 1; opcodes.push(Opcode::Call(argc)); Ok(opcodes) }
Thank you for considering that code.
We now supported nested function calls! What changed?
Let’s take a look at what compile_load is doing.
fn compile_load(&mut self, name: &str) -> Opcode { match self.ensure_context() { Context::Global => Opcode::LoadGlobal(self.get_or_set_nonlocal_index(name)), Context::Local => { // Check locals first if let Some(index) = self.get_local_index(name) { return Opcode::LoadFast(index); } // If not found locally, fall back to globals Opcode::LoadGlobal(self.get_or_set_nonlocal_index(name)) } } }
There are several key principles in action here:
The last thing I’ll leave you with today is a peek into how these variables names are mapped. In the code snippet below, you’ll notice that local indices are found in code.varnames and nonlocal indices are found in code.names. Both live on a CodeObject, which contains the metadata for a block of Python bytecode, including its variable and name mappings.
fn compile_function_call( &mut self, name: &str, args: &ParsedArguments) ) -> Result<Bytecode, CompileError> { let mut opcodes = vec![]; // We push the args onto the stack in reverse call order so that we will pop // them off in call order. for arg in args.args.iter().rev() { opcodes.extend(self.compile_expr(arg)?); } let (_, index) = self.get_local_index(name); // TODO how does this know if it is a global or local index? this may not be the right // approach for calling a function opcodes.push(Opcode::Call(index)); Ok(opcodes) }
The difference between varnames and names tormented me for weeks (CPython calls these co_varnames and co_names), but it’s actually fairly straightforward. varnames holds the variable names for all local variables in a given scope, and names does the same for all nonlocal.
Once we properly track this, everything else just works. At runtime, the VM sees a LOAD_GLOBAL or a LOAD_FAST and knows to look in the global namespace dictionary or the local stack, respectively.
Buddy! Mr Gutenberg is on the phone and says we can hold the presses no longer.
Okay! Fine! I get it! Let’s ship it. ?
Shh! The printing press man doesn’t know I’m writing a conclusion, so I will be brief.
With variable scoping and function calls in a solid place, I’m gradually turning my attention to features like stack traces and async support. If you’ve enjoyed this dive into bytecode or have questions about building your own interpreter, I’d love to hear from you—drop a comment!
Subscribe & Save [on nothing]
If you’d like to get more posts like this directly to your inbox, you can subscribe here!
Work With Me
I mentor software engineers to navigate technical challenges and career growth in a supportive sometimes silly environment. If you’re interested, you can book a session here.
Elsewhere
In addition to mentoring, I also write about my experience navigating self-employment and late-diagnosed autism. Less code and the same number of jokes.
The above is the detailed content of How I added support for nested functions in Python bytecode. For more information, please follow other related articles on the PHP Chinese website!