-
Understanding Python execution from inside: A Python assembly tracer
Written by Hakril
2015-05-07 13:58:40Lately, I have been looking at the Python's execution model. I was curious about the implementation of some opcodes like
YIELD_VALUE
andYIELD_FROM
, how were compiled list comprehensions, generator expressions and other fun Python features, and what happens at the bytecode level when exceptions were raised. Reading theCPython
code is really instructive, but I was feeling like something was missing to fully understand the bytecode execution and the stack evolution. Yeah,GDB
is cool but I'm a lazy guy who wanted a high-level interface and code some Python.So my goal was to create a bytecode-level tracing API like the one offered by sys.settrace but with a finer granularity. This exercise was perfect to practice my C-into-Python coding.
What we are going to need is:
- A new opcode in the
CPython
interpreter - A way to inject the opcode in Python bytecode
- Some Python code to handle the opcode on the Python side
In this article everything is based on Python3.5.
A new opcode for CPython
Our new opcode:
DEBUG_OP
This new opcode, that I will call
DEBUG_OP
, is my first try at writing C code for CPython. I tried to keep it as simple as I could.What we want is a way to call some Python code whenever our opcode is executed. We also want to be able to retrieve some data about our execution context. Our opcode will pass it as parameters to our callback. The useful information I identified is:
- The content of the stack
- The frame object that executed
DEBUG_OP
So all our
DEBUG_OP
needs to do is:- Find the callback
- Create a list with the content of the stack
- Call the callback with that list and the current frame as parameters
Sounds easy! So let's go!
Disclaimer: the following explanations and code are the result of a LOT of segfaults.
First thing to do is to give a name and a value to our opcode. For that, we need to go into
Include/opcode.h
1 2 3 4 5 6 7 8 9 10 11 12
/** My own comments begin by '**' **/ /** From: Includes/opcode.h **/ /* Instruction opcodes for compiled code */ /** We just have to define our opcode with a free value 0 was the first one I found **/ #define DEBUG_OP 0 #define POP_TOP 1 #define ROT_TWO 2 #define ROT_THREE 3
The easy part is done. Now we have to actually code our opcode behaviour.
Implementing
DEBUG_OP
First question we need to ask ourself before even thinking about the implementation of
DEBUG_OP
is "What my interface is going to be?".It's cool to have a shiny new opcode that calls some code but what code is it going to call exactly? How will it retrieve the callback function? I chose what looked like the simplest solution: a fixed name function in the frame globals.
The question is: "how do I look for a fixed C-string in a dict?"
To answer this question we can look at some fixed identifiers used in the Python main loop: the ones associated with context managers
__enter__
and__exit__
.We see that it's used in the
SETUP_WITH
opcode:1 2 3 4 5 6 7
/** From: Python/ceval.c **/ TARGET(SETUP_WITH) { _Py_IDENTIFIER(__exit__); _Py_IDENTIFIER(__enter__); PyObject *mgr = TOP(); PyObject *exit = special_lookup(mgr, &PyId___exit__), *enter; PyObject *res;
Now, a look at the
_Py_IDENTIFIER
macro:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
/** From: Include/object.h **/ /********************* String Literals ****************************************/ /* This structure helps managing static strings. The basic usage goes like this: Instead of doing r = PyObject_CallMethod(o, "foo", "args", ...); do _Py_IDENTIFIER(foo); ... r = _PyObject_CallMethodId(o, &PyId_foo, "args", ...); PyId_foo is a static variable, either on block level or file level. On first usage, the string "foo" is interned, and the structures are linked. On interpreter shutdown, all strings are released (through _PyUnicode_ClearStaticStrings). Alternatively, _Py_static_string allows to choose the variable name. _PyUnicode_FromId returns a borrowed reference to the interned string. _PyObject_{Get,Set,Has}AttrId are __getattr__ versions using _Py_Identifier*. */ typedef struct _Py_Identifier { struct _Py_Identifier *next; const char* string; PyObject *object; } _Py_Identifier; #define _Py_static_string_init(value) { 0, value, 0 } #define _Py_static_string(varname, value) static _Py_Identifier varname = _Py_static_string_init(value) #define _Py_IDENTIFIER(varname) _Py_static_string(PyId_##varname, #varname)
Well, at least the documentation is explicit! With a little more research we can find the
dict
function we were looking for:_PyDict_GetItemId
.So the lookup part of our opcode will look like:
1 2 3 4 5 6 7 8 9 10
/** Our callback function will be named op_target **/ PyObject *target = NULL; _Py_IDENTIFIER(op_target); target = _PyDict_GetItemId(f->f_globals, &PyId_op_target); if (target == NULL && _PyErr_OCCURRED()) { if (!PyErr_ExceptionMatches(PyExc_KeyError)) goto error; PyErr_Clear(); DISPATCH(); }
To be completely explicit, this code needs a few explanations:
f
is our current frame andf->f_globals
is its globals- If we don't find
op_target
, we check if the exception is aKeyError
goto error;
is the main-loop's way of raising the exceptionPyErr_Clear()
suppresses the current exception andDISPATCH()
launches the evaluation of the next opcode
The next step is to gather the information we want (the stack):
1 2 3 4 5 6 7 8 9
/** This code create a list with all the values on the current stack **/ PyObject *value = PyList_New(0); for (i = 1 ; i <= STACK_LEVEL(); i++) { tmp = PEEK(i); if (tmp == NULL) { tmp = Py_None; } PyList_Append(value, tmp); }
The last step is actually calling our callback! For that, we will use
call_function
and learn how to use it by looking at the opcodeCALL_FUNCTION
:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
/** From: Python/ceval.c **/ TARGET(CALL_FUNCTION) { PyObject **sp, *res; /** stack_pointer is a local of the main loop. It's the pointer to the stacktop of our frame **/ sp = stack_pointer; res = call_function(&sp, oparg); /** call_function handles the args it consummed on the stack for us **/ stack_pointer = sp; PUSH(res); /** Standard exception handling **/ if (res == NULL) goto error; DISPATCH(); }
With all that information, we are able to craft our
DEBUG_OP
:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
TARGET(DEBUG_OP) { PyObject *value = NULL; PyObject *target = NULL; PyObject *res = NULL; PyObject **sp = NULL; PyObject *tmp; int i; _Py_IDENTIFIER(op_target); target = _PyDict_GetItemId(f->f_globals, &PyId_op_target); if (target == NULL && _PyErr_OCCURRED()) { if (!PyErr_ExceptionMatches(PyExc_KeyError)) goto error; PyErr_Clear(); DISPATCH(); } value = PyList_New(0); Py_INCREF(target); for (i = 1 ; i <= STACK_LEVEL(); i++) { tmp = PEEK(i); if (tmp == NULL) tmp = Py_None; PyList_Append(value, tmp); } PUSH(target); PUSH(value); Py_INCREF(f); PUSH(f); sp = stack_pointer; res = call_function(&sp, 2); stack_pointer = sp; if (res == NULL) goto error; Py_DECREF(res); DISPATCH(); }
As I didn’t had that much experience with the C code in CPython, I might have missed something (I am looking at you refcounting). Feel free to correct me in this case. ;)
It compiles! So it works!
Well not really... It might seem good but this code will fail when we will try to execute our first
DEBUG_OP
. Since 2008, Python use computed goto (you can read more about computed goto in Python here). So we need to update the goto jump table: we just need to go intoPython/opcode_targets.h
and do the following change:1 2 3 4 5 6 7
/** From: Python/opcode_targets.h **/ /** Easy change since DEBUG_OP is the opcode number 1 **/ static void *opcode_targets[256] = { //&&_unknown_opcode, &&TARGET_DEBUG_OP, &&TARGET_POP_TOP, /** ... **/
And that's all! We now have a fully working new opcode. The only problem is that our opcode is never called as it is inexistent in the compiled bytecode. Now we need to inject
DEBUG_OP
in the bytecode of some functions.Injecting opcode
DEBUG_OP
into Python bytecodeThere are many ways to insert a new opcode into the Python bytecode:
- We can use the peephole optimizer just like Quarkslab did
- We can do some changes in the bytecode generation code
- We can (and we will) just modify the bytecode of some functions at runtime!
Yep, coding that new opcode was enough C for today, let's get back to the source of
Understanding Python
: some hacky, strange (somewhat magical) Python!So, what we are going to do is:
- Take the code object of the function we want to trace
- Rewrite the bytecode to inject some
DEBUG_OP
- Put the new code object in place
Reminder about code object
If you have never heard of
code object
, there was a little introduction somewhere in my first article. There are also some good documentation on the net and the doc page as always (Ctrl+F "code objects").One thing to note in the context of this article is that code objects are not mutable:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Python 3.4.2 (default, Oct 8 2014, 10:45:20) [GCC 4.9.1] on linux Type "help", "copyright", "credits" or "license" for more information. >>> x = lambda y : 2 >>> x.__code__ <code object <lambda> at 0x7f481fd88390, file "<stdin>", line 1> >>> x.__code__.co_name '<lambda>' >>> x.__code__.co_name = 'truc' Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: readonly attribute >>> x.__code__.co_consts = ('truc',) Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: readonly attribute
But don't worry, we will find a way to get around.
Our tools
To do these bytecode modifications we are going to need few tools:
- The
dis
module, used to disassemble and analyse bytecode dis.Bytecode
, a new feature fromPython3.4
that is super useful for disassembly and bytecode analysis!- A way to easily modify code object
dis.Bytecode
disassembles a code object and give us useful information about the opcode, argument and context:1 2 3 4 5 6 7 8 9
# Python3.4 >>> import dis >>> f = lambda x: x + 3 >>> for i in dis.Bytecode(f.__code__): print (i) ... Instruction(opname='LOAD_FAST', opcode=124, arg=0, argval='x', argrepr='x', offset=0, starts_line=1, is_jump_target=False) Instruction(opname='LOAD_CONST', opcode=100, arg=1, argval=3, argrepr='3', offset=3, starts_line=None, is_jump_target=False) Instruction(opname='BINARY_ADD', opcode=23, arg=None, argval=None, argrepr='', offset=6, starts_line=None, is_jump_target=False) Instruction(opname='RETURN_VALUE', opcode=83, arg=None, argval=None, argrepr='', offset=7, starts_line=None, is_jump_target=False)
To be able to modify code objects, I just created a small class that clones a code object, allows to modify the values we want and generates a new code object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
class MutableCodeObject(object): args_name = ("co_argcount", "co_kwonlyargcount", "co_nlocals", "co_stacksize", "co_flags", "co_code", "co_consts", "co_names", "co_varnames", "co_filename", "co_name", "co_firstlineno", "co_lnotab", "co_freevars", "co_cellvars") def __init__(self, initial_code): self.initial_code = initial_code for attr_name in self.args_name: attr = getattr(self.initial_code, attr_name) if isinstance(attr, tuple): attr = list(attr) setattr(self, attr_name, attr) def get_code(self): args = [] for attr_name in self.args_name: attr = getattr(self, attr_name) if isinstance(attr, list): attr = tuple(attr) args.append(attr) return self.initial_code.__class__(*args)
Easy to use, that's one problem solved!
1 2 3 4 5 6 7 8 9 10
>>> x = lambda y : 2 >>> m = MutableCodeObject(x.__code__) >>> m <new_code.MutableCodeObject object at 0x7f3f0ea546a0> >>> m.co_consts [None, 2] >>> m.co_consts[1] = '3' >>> m.co_name = 'truc' >>> m.get_code() <code object truc at 0x7f3f0ea2bc90, file "<stdin>", line 1>
Testing our new opcode
Now that we have the basic tools to inject some
DEBUG_OP
, we should be able to verify if our implementation is usable.For that, we are just going to add our opcode in the simplest function ever.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
from new_code import MutableCodeObject def op_target(*args): print("WOOT") print("op_target called with args <{0}>".format(args)) def nop(): pass new_nop_code = MutableCodeObject(nop.__code__) new_nop_code.co_code = b"\x00" + new_nop_code.co_code[0:3] + b"\x00" + new_nop_code.co_code[-1:] new_nop_code.co_stacksize += 3 nop.__code__ = new_nop_code.get_code() import dis dis.dis(nop) nop() # Don't forget that ./python is our custom Python implementing DEBUG_OP hakril@computer ~/python/CPython3.5 % ./python proof.py 8 0 <0> 1 LOAD_CONST 0 (None) 4 <0> 5 RETURN_VALUE WOOT op_target called with args <([], <frame object at 0x7fde9eaebdb0>)> WOOT op_target called with args <([None], <frame object at 0x7fde9eaebdb0>)>
Sounds like it works! One line might need some explanations:
new_nop_code.co_stacksize += 3
:co_stacksize
represents the stack size needed by the code object- Our
DEBUG_OP
push 3 values to the stack, so we need to reserve space for it
Now we need to be able to inject our opcode in every Python functions! Be brave!
Rewriting bytecode
As we have seen in the last example, rewriting Python bytecode sounds easy! To inject our
DEBUG_OP
between every opcode, all we have to do is to get the offset of every opcode (injecting our opcode into arguments would be harmful) and inject our opcode at these offsets. The offsets will be easy to get, usingdis.Bytecode
.Something like that:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
def add_debug_op_everywhere(code_obj): # We get every instruction offset in the code object offsets = [instr.offset for instr in dis.Bytecode(code_obj)] # And insert a DEBUG_OP at every offset return insert_op_debug_list(code_obj, offsets) def insert_op_debug_list(code, offsets): # We insert the DEBUG_OP one by one for nb, off in enumerate(sorted(offsets)): # Need to ajust the offsets by the number of opcodes already inserted before # That's why we sort our offsets! code = insert_op_debug(code, off + nb) return code # Last problem: what does insert_op_debug looks like?
One might think (based on the previous example) that our
insert_op_debug
will just add a"\x00"
at the specified offset, but there is a TRAP! Our first example ofDEBUG_OP
insertion was a simple code without any branch. To have a fully functioninginsert_op_debug
, we need to take care of such branching opcodes.Python branches are really simple, there are two types of branches:
- Absolute branches: the branch will look like
Instruction_Pointer = argument(instruction)
- Relative branches: the branch will look like
Instruction_Pointer += argument(instruction)
- Relative branches are always forward
As we want those branches to be still valid after our
DEBUG_OP
insertions, we will need to rewrite those instructions arguments. So here is the logic I used:-
For every relative branch before our insertion offset:
- If the destination is strictly superior to our insertion offset, add 1 to the instruction argument
- If it is equal, no need to add 1, it will allow us the execute our
DEBUG_OP
between the jump and its target - If it's less, then our
DEBUG_OP
won't change the distance between theJUMP
and the destination
-
For every absolute branch in the code object:
- If the destination is strictly superior to our insertion offset, add 1 to the instruction argument
- No modification if it is equal, for the same reason as the relative branches
- If it's less, our
DEBUG_OP
insertion won't change the address of the destination
Here is the implementation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
# Helper def bytecode_to_string(bytecode): if bytecode.arg is not None: return struct.pack("<Bh", bytecode.opcode, bytecode.arg) return struct.pack("<B", bytecode.opcode) # Dummy class for bytecode_to_string class DummyInstr: def __init__(self, opcode, arg): self.opcode = opcode self.arg = arg def insert_op_debug(code, offset): opcode_jump_rel = ['FOR_ITER', 'JUMP_FORWARD', 'SETUP_LOOP', 'SETUP_WITH', 'SETUP_EXCEPT', 'SETUP_FINALLY'] opcode_jump_abs = ['POP_JUMP_IF_TRUE', 'POP_JUMP_IF_FALSE', 'JUMP_ABSOLUTE'] res_codestring = b"" inserted = False for instr in dis.Bytecode(code): if instr.offset == offset: res_codestring += b"\x00" inserted = True if instr.opname in opcode_jump_rel and not inserted: #relative jump are always forward if offset < instr.offset + 3 + instr.arg: # inserted beetwen jump and dest: add 1 to dest (3 for size) #If equal: jump on DEBUG_OP to get info before exec instr res_codestring += bytecode_to_string(DummyInstr(instr.opcode, instr.arg + 1)) continue if instr.opname in opcode_jump_abs: if instr.arg > offset: res_codestring += bytecode_to_string(DummyInstr(instr.opcode, instr.arg + 1)) continue res_codestring += bytecode_to_string(instr) # replace_bytecode just replaces the original code co_code return replace_bytecode(code, res_codestring)
We can look at the result:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98
>>> def lol(x): ... for i in range(10): ... if x == i: ... break >>> dis.dis(lol) 101 0 SETUP_LOOP 36 (to 39) 3 LOAD_GLOBAL 0 (range) 6 LOAD_CONST 1 (10) 9 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 12 GET_ITER >> 13 FOR_ITER 22 (to 38) 16 STORE_FAST 1 (i) 102 19 LOAD_FAST 0 (x) 22 LOAD_FAST 1 (i) 25 COMPARE_OP 2 (==) 28 POP_JUMP_IF_FALSE 13 103 31 BREAK_LOOP 32 JUMP_ABSOLUTE 13 35 JUMP_ABSOLUTE 13 >> 38 POP_BLOCK >> 39 LOAD_CONST 0 (None) 42 RETURN_VALUE >>> lol.__code__ = transform_code(lol.__code__, add_debug_op_everywhere, add_stacksize=3) >>> dis.dis(lol) 101 0 <0> 1 SETUP_LOOP 50 (to 54) 4 <0> 5 LOAD_GLOBAL 0 (range) 8 <0> 9 LOAD_CONST 1 (10) 12 <0> 13 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 16 <0> 17 GET_ITER >> 18 <0> 102 19 FOR_ITER 30 (to 52) 22 <0> 23 STORE_FAST 1 (i) 26 <0> 27 LOAD_FAST 0 (x) 30 <0> 103 31 LOAD_FAST 1 (i) 34 <0> 35 COMPARE_OP 2 (==) 38 <0> 39 POP_JUMP_IF_FALSE 18 42 <0> 43 BREAK_LOOP 44 <0> 45 JUMP_ABSOLUTE 18 48 <0> 49 JUMP_ABSOLUTE 18 >> 52 <0> 53 POP_BLOCK >> 54 <0> 55 LOAD_CONST 0 (None) 58 <0> 59 RETURN_VALUE # Setup the simplest handler EVER >>> def op_target(stack, frame): ... print (stack) # GO >>> lol(2) [] [] [<class 'range'>] [10, <class 'range'>] [range(0, 10)] [<range_iterator object at 0x7f1349afab80>] [0, <range_iterator object at 0x7f1349afab80>] [<range_iterator object at 0x7f1349afab80>] [2, <range_iterator object at 0x7f1349afab80>] [0, 2, <range_iterator object at 0x7f1349afab80>] [False, <range_iterator object at 0x7f1349afab80>] [<range_iterator object at 0x7f1349afab80>] [1, <range_iterator object at 0x7f1349afab80>] [<range_iterator object at 0x7f1349afab80>] [2, <range_iterator object at 0x7f1349afab80>] [1, 2, <range_iterator object at 0x7f1349afab80>] [False, <range_iterator object at 0x7f1349afab80>] [<range_iterator object at 0x7f1349afab80>] [2, <range_iterator object at 0x7f1349afab80>] [<range_iterator object at 0x7f1349afab80>] [2, <range_iterator object at 0x7f1349afab80>] [2, 2, <range_iterator object at 0x7f1349afab80>] [True, <range_iterator object at 0x7f1349afab80>] [<range_iterator object at 0x7f1349afab80>] [] [None]
Wonderful! We now have a way to get the state of our stack and our frame at every Python instruction. The rendering of the results is not quite usable in the current state. Let's add some wrapper in the last section!
Adding some Python wrapping
As you can see, all of the low level interface works. Our last mission is to make our
op_target
useful. (This part might be a little empty: it's not the funniest part of this project in my eyes)The first thing that we want to do is to exploit the information given by the frame parameter. If we look at the informations stored in a frame we can see this:
f_code
: the code object being executed in this framef_lasti
: gives the current instruction (this is an index into the bytecode string of the code object)
Now our handle is able to know which opcode will be executed just after our
DEBUG_OP
. This will be useful to aggregate the data and do some nice display.We can create a class that will setup the tracing mechanism for a function:
- Change its
co_code
- Setup a callback as the target function
op_debug
As we know the next instruction, we can analyse it and modify its arguments. For example, we can add an
auto-follow-called-functions
feature:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
def op_target(l, f, exc=None): if op_target.callback is not None: op_target.callback(l, f, exc) class Trace: def __init__(self, func): self.func = func def call(self, *args, **kwargs): self.add_func_to_trace(self.func) # Activate Trace callback for the func call op_target.callback = self.callback try: res = self.func(*args, **kwargs) except Exception as e: res = e op_target.callback = None return res def add_func_to_trace(self, f): # Is it code? is it already transformed? if not hasattr(f ,"op_debug") and hasattr(f, "__code__"): f.__code__ = transform_code(f.__code__, transform=add_everywhere, add_stacksize=ADD_STACK) f.__globals__['op_target'] = op_target f.op_debug = True def do_auto_follow(self, stack, frame): # Nothing fancy: FrameAnalyser is just the wrapper that gives the next executed instruction next_instr = FrameAnalyser(frame).next_instr() if "CALL" in next_instr.opname: arg = next_instr.arg f_index = (arg & 0xff) + (2 * (arg >> 8)) called_func = stack[f_index] # If call target is not traced yet: do it if not hasattr(called_func, "op_debug"): self.add_func_to_trace(called_func)
Now, all we have to do is to implement sub-classes with the method
callback
which will be called every instruction and the methoddo_report
that will print the gathered information.Here is an example of a dummy tracer that follows function calls:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
class DummyTrace(Trace): def __init__(self, func): self.func = func self.data = collections.OrderedDict() self.last_frame = None self.known_frame = [] self.report = [] def callback(self, stack, frame, exc): if frame not in self.known_frame: self.known_frame.append(frame) self.report.append(" === Entering New Frame {0} ({1}) ===".format(frame.f_code.co_name, id(frame))) self.last_frame = frame if frame != self.last_frame: self.report.append(" === Returning to Frame {0} {1}===".format(frame.f_code.co_name, id(frame))) self.last_frame = frame self.report.append(str(stack)) instr = FrameAnalyser(frame).next_instr() offset = str(instr.offset).rjust(8) opname = str(instr.opname).ljust(20) arg = str(instr.arg).ljust(10) self.report.append("{0} {1} {2} {3}".format(offset, opname, arg, instr.argval)) self.do_auto_follow(stack, frame) def do_report(self): print("\n".join(self.report))
Here are some examples of implementation and uses. The format may be hard to read, I am not good at user-friendly reporting...
Example 1: auto-tracing with dummy dump of stack and executed instructions
Example 2: context manager at work
And, at last, a demo of how list comprehensions work:
Example 3: output of dummy tracer
Example 4: output of Stack aggregation tracer
Conclusion
This little project was a good way to have an introduction to some low-level Python, the interpreter's main loop, Python C-coding and Python bytecode. Also, the resulting tool is a good way to have a quick look at the bytecode-behavior of some fun Python constructions like generators, context managers or list comprehensions.
Here is the complete code of this experimentation.
Another thing that could be done is adding a way to modify the stack of the function we are tracing. I am not sure what it could be used for, but that would be fun!
Tweetgate .io - A new opcode in the
-
Understanding Python by breaking it - Lvl 2
Written by Hakril
2014-05-15 17:51:12After my first article I kept playing with
ctypes
and tried to see what fun things that could be possible with little modifications to the Python's objects at runtime. The goal of this article is to explain how to get reliable native code execution as a simple Python objet. By reliable I mean: that, if our payload respect certain rules, we can call it multiple times from the same Python code.If you have not read my gate.io startup, here is what you need to know:
- the
ctypes
module allows you to manipulate C structure from Python, - we can map a
ctypes
structure on a given address usingstruct.from_address
, - in CPython:
id(obj)
returns the address of obj.
Using
ctypes.Structure
Last time, I only used
id(obj)
to access the internal variables of an object. This time we need to do more complex things, so I am going to use thectypes.Structure
type for more readability.
Here is an example:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
import ctypes >>> class PyObj(ctypes.Structure): ... _fields_ = [("ob_refcnt", ctypes.c_size_t), ... ("ob_type", ctypes.c_void_p)] # Must be casted # Inheritance of ctypes.Structure: PyVarObj will also have the fields of PyObj >>> class PyVarObj(PyObj): ... _fields_ = [("ob_size", ctypes.c_size_t)] >>> l = [1,2,3,4] >>> lm = PyVarObj.from_address(id(l)) >>> lm.ob_refcnt 1L >>> lm.ob_size 4L >>> lm.ob_size = 2 >>> l [1,2]
Now we have a nice way to access to C-only data with readable and explicite names!
So, let's get to work !
Dummy code execution
Instruction Pointer Control
Before having a nice and reliable native code execution, the first step is having code execution at all. The most obvious type we want to check is built-in function, this is a Python type that represents native code.
Let's have a look at the structure of a built-in function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
/** My own comments begin by '**' **/ /** From: Include/methodobject.h **/ typedef PyObject *(*PyCFunction)(PyObject *, PyObject *); /** A standard function pointer **/ struct PyMethodDef { const char *ml_name; /* The name of the built-in function/method */ PyCFunction ml_meth; /* The C function that implements it */ int ml_flags; /* Combination of METH_xxx flags, which mostly describe the args expected by the C func */ const char *ml_doc; /* The __doc__ attribute, or NULL */ }; typedef struct { PyObject_HEAD PyMethodDef *m_ml; /* Description of the C function to call */ PyObject *m_self; /* Passed as 'self' arg to the C func, can be NULL */ PyObject *m_module; /* The __module__ attribute, can be anything */ } PyCFunctionObject;
We can see that a
PyCFunctionObject
is just a C function pointer with some meta-data.
Here is thectypes
structure that represents those two structs:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
class PyMethodDef(ctypes.Structure): _fields_ = [("ml_name", ctypes.POINTER(ctypes.c_char)), ("ml_meth", ctypes.c_void_p), ("ml_flags", ctypes.c_int), ("ml_doc", ctypes.POINTER(ctypes.c_char))] # Reuse of PyObj from the first example class PyCFunctionObject(PyObj): _fields_ = [("m_ml", ctypes.POINTER(PyMethodDef)), ("m_self", ctypes.c_void_p), ("m_module", ctypes.c_void_p)] # The only new thing is ctypes.POINTER. # This represent a pointer to another type, the # content can be accessed by using `.contents` >>> mabs = PyCFunctionObject.from_address(id(abs)) >>> mabs.m_ml.contents.ml_name[0:3] 'abs'
It seems to work. Now we need to see if code execution works too.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
# Launching Python into GDB to nicely handle the crash $ gdb -q -args `which python` -i intro_type.py ... # Get the address of the builtin `abs` (gdb) x/i 'builtin_abs.32990' 0x51c8f0 <builtin_abs.32990>: sub $0x8,%rsp (gdb) r ... >>> # repeat code of last example # See if the `abs` C code match with what we found in gdb >>> mabs.m_ml.contents.ml_meth '0x51c8f0' >>> mabs.m_ml.contents.ml_meth = 0x42424242 >>> abs(-1) # Yeah! Program received signal SIGSEGV, Segmentation fault. 0x0000000042424242 in ?? () (gdb)
That was easy! but this part was the simple one. If we want to get reliable code execution many steps remain.
Executing our code
The next step is to actually execute some of our code. For that, I chose the
mmap
module to create executable pages and stuff them with code and data.There is only two problems with the
mmap
module:- the
mmap
type doesn't expose the actual address of the page to Python (as usual we will usectypes
to figure that out), - the
mmap
"API" is not the same for Linux and Windows (I have simply created my own platform-independant wrapper for that).
How to get the real address of the mmaped page? same procedure as usual:
1 2 3 4 5 6 7
typedef struct { PyObject_HEAD char * data; size_t size; size_t pos; /* relative to offset */ ... /** Other fields that are non-relevant to our goal **/ } mmap_object;
All we need is the first field, which points to our page. In the following examples, all
PyObj
that I describe can be found in the fileintro_type.py
.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
import intro_type import mmap # Ctypes object for introspection class PyMmap(intro_type.PyObj): _fields_ = [("ob_addr", ctypes.c_size_t)] # Specific mmap class for code injection class MyMap(mmap.mmap): """ A mmap that is never unmaped and that contains the page address """ def __init__(self, *args, **kwarg): # Get the page address by introspection of the C struct m = PyMmap.from_address(id(self)) self.addr = m.ob_addr # Prevent garbage collection (and so the unmapping) of the page m.ob_refcnt += 1 @classmethod def get_map(cls, size) """ Returns a page of size 'size': the implementation is not important for our example. See the repository if you want the actual code. ;) """
Now, to be sure that our first step (simple code execution) works, we will replace the
abs
builtin by anidentity
one, equivalent tolambda x : x
.I run a
64 bits
Python so the example of native code here is 64 bits, if you want the 32 bits version: look at the repository. ;)
The payload used for identity is:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
# All ctypes structs and MyMap classes are in the intro_type.py file. import intro_type # RSI is the first argument we pass and RAX is the return value. # So we just return our first argument. # Normally, the first argument of a function is RDI but builtins receive a 'self' as the actual first argument. payload = "4889F0" + "C3" # mov rax, rsi ; ret payload = payload.decode('hex') # Get a page. page = intro_type.MyMap.get_map(0x1000) # Put our payload at the beginning. page[0:len(payload)] = payload mabs = intro_type.PyCFunctionObject.from_address(id(abs)) # Change the function pointer to the address of our payload. mabs.m_ml.contents.ml_meth = page.addr >>> abs(4) 4 # Well it doesn't segfault: good news! >>> abs(-4) -4 # Well this is not the real abs anymore: another good news! >>> x = (1,2,3) >>> abs(x) (1,2,3) >>> abs(x) <refcnt 0 at 0x7f48b121eaa0> # Yep, do not forget that our function doesn't increment the refcount of the returned object. # This version can lead to crash!
Perfect, our dummy code execution works!
Object hand-crafting
The next step of our journey is the hand-crafting of some Python objects.
Why? because it's fun and it will be useful for our final goal!Writing raw objects in memory
The first step of object crafting is to write the object data in memory.
For that, we are going to use thestruct
module (another really useful module). In this example we write the data for anint
object (in 64 bits Python):1 2 3 4 5
/** here is the struct that represents an int **/ typedef struct { PyObject_HEAD long ob_ival; } PyIntObject;
So:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
import struct >>> help(struct.pack_into) pack_into(...) Pack the values v1, v2, ... according to fmt. Write the packed bytes into the writable buffer buf starting at offset. # This module heavily uses the struct format. # Doc is here: https://docs.python.org/2/library/struct.html. # Write three unsigned long long in little endian at offset 0 of 'page'. >>> struct.pack_into("<QQQ", page, 0, 2, #ob_refcnt id(int), # ob_type 42) # ob_ival # That's all! # Problem is: I have no way to prove you that it works before the next sub-step. :( # Trust me!
The problem is: even if we have crafted a valid
int
in memory, it is not usable since the Python interpreter as no reference to it.
The next step is to create a reference to our crafted objet.Giving life to the crafted object
Here is the problem:
- we have a valid object in memory,
- we have its address,
- the Python interpreter have no reference to it,
- we need to create the reference to give life to our object.
All we have to do is call a function that can return arbitrary valid references.
The easiest way is to return the address (so the object) corresponding to the value of the
int
we receive as parameter.
Think of it as the reverseid
function:1 2 3 4 5 6
# How our function should work: >>> x = (1,2,3) >>> id(x) 140147953449472 >>> my_func(140147953449472) (1,2,3)
If this description is not enough, here is what the C code should be:
1 2 3 4 5
PyObject* my_func(PyIntObject* i) { return (PyObject *) i->ob_ival; } /** Here we can see can the function can be harmful if wrongly used **/
So, here are the payloads for this function:
1 2 3 4 5 6 7 8 9 10 11 12
# int-to-object payloads: # 64bits # mov rax, [rsi + 0x10] # rsi is the address of the int; [rsi + 0x10] the ob_ival of the int # ret bootstrap_64 = "48 8b 46 10 C3".replace(" ", "").decode("hex") # 32bits # mov eax, [esp + 8] # [esp + 8] -> builtin argument # mov eax, [eax + 0x8] # [eax + 14] ob_ival # ret bootstrap_32 = "8B 44 24 08 8B 40 08 C3".replace(" ", "").decode("hex")
Putting everything together, we get:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
import struct import intro_type # Write our object into a page. >>> page = intro_type.MyMap.get_map(0x1000) >>> struct.pack_into("<QQQ", page, 0, 2, id(int), 42) # Put the payload just after our int object in the page. >>> bootstrap_64 = "48 8b 46 10 C3".replace(" ", "").decode("hex") >>> page[24:24 + len(bootstrap_64)] = bootstrap_64 # Replace abs by our function. >>> mabs = intro_type.PyCFunctionObject.from_address(id(abs)) >>> mabs.m_ml.contents.ml_meth = page.addr + 24 # Give life to our int! >>> x = abs(int(page.addr)) #IT'S ALIVE!! >>> x 42 # Proof that our int is not the original 42 (addresses comparaison). >>> x is 42 False # Proof that we have an object at the beginning of our page. >>> hex(id(x)) '0x7ffff7ff8000L' >>> hex(page.addr) '0x7ffff7ff8000L'
We are now able to create Python objects out of the great void! We are near our goal.
Reliable native code execution
The problems with the previous dummy code execution are:
- we have a limited number of different functions (limited by the number of existing builtins),
- each time we create a new dummy-execution we lose a builtin,
- we put the interpreter in an instable state as any part of the code could potentially use a replaced buitlin without knowing.
With the tools we acquired previously, the obvious solution to these problems is: we are going to craft our own builtin objects!
By doing so, we can have any number of different native functions without putting the interpreter in an unstable state by replacing builtins.
So here are the steps to create a complete builtin object:
- Write our payload on a page.
-
Write a
PyMethodDef
on the page with the following values:ml_name
: address of a C string (usingctypes.addressof(ctypes.create_string_buffer(name))
),ml_math
: address of the payload,ml_flags
:8
forMETH_O
(method with a simple object, can be changed depending of our need),ml_doc
:0
(no need to document soNone
is cool) or address of a C string.
-
Write a
PyCFunctionObject
on the page with the following values:ob_refcnt
:0x42
(we really don't want it to be collected since not allocated by Python),ob_type
:<type 'builtin_function_or_method'>
soid(type(abs))
is fine,m_ml
: address ofPyMethodDef
,m_self
:0
,m_module
:0
(or anyPyObj
address).
-
Give life to our object (see previous section).
Here is a demo for
x86_64
:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
import struct import intro_type def create_builtin(code, name="crafted builtin"): """ Dummy implementation of 'create_builtins' for x86_64""" # Create page. m = intro_type.MyMap.get_map(0x1000) # Get a C string with the name. cname = ctypes.create_string_buffer(name) name_addr = ctypes.addressof(cname) # Write the payload on the page. m[0:len(code)] = code # Write the PyMethodDef on the page. struct.pack_into("<QQQQ", m, 0x100, name_addr, m.addr, 8, name_addr) # Write the PyCFunctionObject on the page. struct.pack_into("<QQQQQ", m, 0x200, 42, id(type(min)), m.addr + 0x100, 0, 0) # Give life to our object. return give_life_to_obj(int(m.addr + 0x200)) # Create page with our `give_life_to_obj` payload. code_page = intro_type.MyMap.get_map(0x1000) bootstrap_64 = "48 8b 46 10 C3".replace(" ", "").decode("hex") code_page[0:len(bootstrap_64)] = bootstrap_64 # Replace abs by our function. mabs = intro_type.PyCFunctionObject.from_address(id(abs)) mabs.m_ml.contents.ml_meth = code_page.addr give_life_to_obj = abs # A builtin that trigger a breakpoint when executed. breakpoint = create_builtin("\xcc", "breakpoint") >>> breakpoint <built-in function breakpoint> >>> breakpoint.__name__ 'breakpoint' >>> breakpoint(None) [1] 4870 trace trap (core dumped) python -i demo.py # woot! breakpoint triggered \o/
The only problem now is the bootstrap of
give_life_to_obj
: to give life to a new builtin object we would need to already have the dummy execution. So we are going to:- replace
abs
by ourgive_life_to_object
payload, - create our first builtin:
real_give_life_to_object
, - put back the original code of
abs
, - use our new builtin to create the builtin objects.
Bootstrap code:
1 2 3 4 5 6 7 8 9 10
def bootstrap(): payload_64 = "48 8b 46 24 C3".replace(" ", "").decode("hex") # 'replace_builtin' replaces the code of its first argument and returns the old 'ml_meth' for restoration. old_abs = replace_builtin(abs, payload_64) # In my final code 'give_life_to_object' is named 'get_obj_at'. get_obj_at = abs new_obj_at = self.create_builtin(payload_64, "get_obj_at") # Just restore old code of builtin abs. restore_builtins(abs, old_abs) get_obj_at = new_obj_at
With that, we are able to create all the builtins we want!
Windows/Linux and 32/64 bits compatibility
Before releasing the code I worked at getting a working Linux and Windows version for both the 32 and 64 architectures.
Here are what I had to do:Windows/Linux:
- No change except MyMap class (mmap wrapper).
- Get the OS using
platform.system()
. - Call
mmap
with correct arguments depending on the system.
32/64 bits:
- Get the architecture using:
platform.architecture()[0]
. -
Adapt:
- the payload for
obj_at
(describe asgive_life_to_obj
in the article), - the format strings for
struct.pack_into
.
- the payload for
Last problem I had was with the way I gave life to object. It doesn't work very well on 32 bits systems because
int(addr)
may return along
, which is not compatible with our payload.So in the final version, I use the
str
type and a string such as"\x41\x42\x43\x44"
to pass the address of the object I want to give life to. This only change the way the payload is called, and how the offset the payload is looked for in the object.Final result
The final result is pretty simple, all the magic is put into the
CodeInjector
class in the filecode_injection.py
. Just use it like this:1 2 3 4 5 6 7
>>> import code_injection >>> code_injection.CodeInjector <class 'code_injection.CodeInjector'> >>> ci = code_injection.CodeInjector() >>> b = ci.create_builtin("\xcc", "my_breakpoint_builtin") >>> b <built-in function my_breakpoint_builtin>
Conclusion
I had a LOT of fun creating new objects and having Python interface so nicely with my own native payloads. Of course, this kind of horrible stuff (Pythonicly speaking) should not be used in real projects...
I hope you enjoyed this article and learned something (I did !).
Also,struct
is a super-useful module for low level stuff! I discovered it during my first CTF and use it at almost every CTFs now.The complete implementation is avalaible at https://bitbucket.org/Hakril/python-native-execution/.
- the
-
Understanding Python by breaking it
Written by Hakril
2014-03-27 16:57:42I have recently discovered
ctypes
and decided to use it to manipulate Python in a way I should not. The purpose was to see some of the Python's internals at work directly from the interpreter.
The article studies to theCPython
implementation of Python 2.7.Reference counting and the Garbage Collector
One thing to know about Python is that its Garbage Collector uses reference counting. It means that every Python object has a ref counter that knows how many references are pointing to the object.
Our first goal is to visualise that ref counter for any object from the Python interpreter.The principal tool we are going to use is:
ctypes._CData.from_address
that allows us to create actypes
mapping on any address we want:
Example (a bad one)
1 2 3
import ctypes ctypes.c_size_t.from_address(0) # segfault! (to dereference 0 is not a good idea)
So we now have the ability to read any value in our address space. Next step is to get an object's address. For that, we will use a builtin function:
id()
: returns the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime.
CPython
implementation detail: This is the address of the object in memory.
All we need to do now is to get the offset of the ref counter in the Python object:
1 2 3 4 5 6 7 8 9 10 11 12
/** My own comments begin by '**' **/ /** From: Includes/object.h **/ typedef struct _object { PyObject_HEAD } PyObject; /* PyObject_HEAD defines the initial segment of every PyObject. */ #define PyObject_HEAD \ _PyObject_HEAD_EXTRA /** empty in standard build **/ \ Py_ssize_t ob_refcnt; /**ref counter **/ \ struct _typeobject *ob_type; /** pointer to the type **/
We can see that
ob_refcnt
is the first field of the struct, and thereforeaddr(ob_refcnt) == addr(object) == id(object)
.So our function will simply be :
1 2 3
def get_ref(obj): """ returns a c_size_t, which is the refcount of obj """ return ctypes.c_size_t.from_address(id(obj))
Let's try it!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
>>> import ctypes >>> def get_ref(obj): ... """ returns a c_size_t, which is the refcount of obj """ ... return ctypes.c_size_t.from_address(id(obj)) ... # the c_ulong is not a copy of the address # so any modification of the ob_refcnt are directly visible >>> l = [1,2,3,4] >>> l_ref = get_ref(l) >>> l_ref c_ulong(1L) # there is just one reference on the list (l) >>> l2 = l >>> l_ref c_ulong(2L)# two references on the list (l and l2) >>> del l >>> l_ref c_ulong(1L) >>> del l2 >>> l_ref c_ulong(0L) # no more reference! >>> another_list = [0, 0, 7] >>> l_ref c_ulong(1L) # woot : old list's ob_refcnt have changed
This example is pretty straightforward, but the last two lines need an explanations. Why would creating a new list change the ref counting of the old one ?
This is the work of the Garbage Collector! When theold_list
ref count go to 0, the GC "cleans" the list and put it in a pool of unused lists that will be used the next list to be created!Proof:
1 2 3 4 5 6 7 8 9 10
>>> l1 = [1,2,3] >>> l2 = [1,2,3] >>> hex(id(l1)) '0xa367e8' >>> hex(id(l2)) '0xa36d40' # not the same address as l1 (hopefully) >>> del l1 # l1 is garbage collected >>> l3 = [8,0,0,8,5] >>> hex(id(l3)) '0xa367e8' # l1 address is reused for the new list l3
Special case :
int
andstr
In the
CPython
implementation ofint
, the references to[-5 ; 256]
are shared. And we have the tools to verify that!1 2 3 4 5 6 7 8 9 10 11 12 13
# with "non-shared" int >>> x1 = 257 >>> x2 = 257 >>> get_ref(x1) c_ulong(1L) >>> get_ref(x2) c_ulong(1L) >>> id(x1) == id(x2) False # logic : differents objects >>> x3 = 0 >>> get_ref(x3) c_ulong(409L) # all ref to "0" point the the same "int(0)" object
Same thing happens with strings!
1 2 3 4 5 6 7 8
>>> p = "python" >>> get_ref(p) c_ulong(8L) >>> p2 = "python" >>> get_ref(p2) c_ulong(9L) >>> id(p) == id(p2) True # the two variables are references to the same string object!
Breaking reference counter
For now, we have just read the value of
ob_refcnt
. What if we change it ?
We can rewriteob_refcnt
to force a premature garbage collection!1 2 3 4 5 6 7 8 9 10 11 12
>>> x = [1,2,3,4] >>> x_ref = get_ref(x) >>> xx = x >>> x_ref c_ulong(2L) >>> x_ref.value = 1 # GC now thinks that we have just one reference to the list >>> del xx # ob_refcnt == 0 -> garbage collection! >>> x [] # garbage collection of a list sets its size to 0 >>> another_list = [0, 4, 8, 15 ,16, 23, 42] >>> x [0, 4, 8, 15 ,16, 23, 42] # always the same "reuse list" tricks!
Of course: this kind of code put the interpreter into an unstable state that will likeky cause crashes!
Messing with tuple
Other fun and simple things to mess with, are tuples. The documentation says "
tuple
is an immutable sequence type" : let's try to change this!If you look at the
CPython
implementation you can find:1 2 3 4 5 6 7 8 9 10
/** From: Includes/tupleobject.h **/ typedef struct { PyObject_VAR_HEAD PyObject *ob_item[1]; /** An array of pointer to PyObject **/ } PyTupleObject; /** From: Includes/object.h **/ #define PyObject_VAR_HEAD \ PyObject_HEAD /** See: part 1 **/ \ Py_ssize_t ob_size; /* Number of items in variable part */
So the two importants things about tuples are:
- a tuple is basically just an array of pointers to
PyObject
and, - a tuple object is composed of three
size_t
(ref_count
,ob_type
,ob_size
) and the array of pointers.
Based on that and the
memmove
implementation inctypes
we can build atuplecopy
function:1 2 3 4 5 6 7 8 9 10 11 12
def tuplecpy(dst, src, begin_offset): """ Of course this function should NEVER be used in real code It will probably result in segfaults/crashes - copy tuple(src) to dst[begin_offset:] tuple - remember id(x) -> addressof(x) """ OFFSET = ctypes.sizeof(ctypes.c_size_t) * 3 PTR_SIZE = ctypes.sizeof(ctypes.c_size_t) dst_addr = id(dst) + OFFSET + PTR_SIZE * begin_offset src_addr = id(src) + OFFSET ctypes.memmove(dst_addr, src_addr, len(src) * PTR_SIZE)
Let's try this new toy!
1 2 3 4 5 6 7 8 9 10
>>> x1 = tuple("A" * 4) >>> x2 = tuple("B" * 2) >>> print ("BEFORE -> x1 = {0} | x2 = {1}".format(x1, x2)) >>> tuplecpy(x1, x2, 2) >>> print ("AFTER -> x1 = {0} | x2 = {1}".format(x1, x2)) #Result: # BEFORE -> x1 = ('A', 'A', 'A', 'A') | x2 = ('B', 'B') # AFTER -> x1 = ('A', 'A', 'B', 'B') | x2 = ('B', 'B')
It works! but why is this foundamentally bad (besides the fact of modifying tuples)?
The answer is in the first section: we have created new references to multiple objects (the ones in the src tuple) without any update of their ref count1 2 3 4 5 6 7 8 9 10 11 12 13 14
>>> x1 = tuple("B" * 4) >>> x2 = ([1,2,3,4],) # problem is : >>> t = get_ref(x2[0]) >>> t c_ulong(1L) >>> del x2 # no more references >>> x1 ('B', 'B', 'B', <refcnt 0 at 0xacac68>) # nice printing of an object with ob_refcnt == 0 :) >>> l = [0, 42, 69] >>> x1 # GC IN ACTION \o/ ('B', 'B', 'B', [0, 42, 69]) # Same principle as before
We can just fix our function so that it updates the reference counter! (but it doesn't mean that we should now use this function in real code..)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
def tuplecpy(dst, src, begin_offset): """ Of course this function should NEVER be used in real code It will probably result in segfaults/crashes - copy tuple(src) to dst[begin_offset:] tuple - remember id(x) -> addressof(x) """ OFFSET = ctypes.sizeof(ctypes.c_size_t) * 3 PTR_SIZE = ctypes.sizeof(ctypes.c_size_t) for obj in src: x = get_ref(obj) x.value = x.value + 1 # manually update ob_refcnt dst_addr = id(dst) + OFFSET + PTR_SIZE * begin_offset src_addr = id(src) + OFFSET ctypes.memmove(dst_addr, src_addr, len(src) * PTR_SIZE)
Diving into function and code objects.
Now that we can modify tuple, we are going to see the impact of modifying some important tuples in function objects and code objects
Code object
As explained in the documentation : "Code objects represent byte-compiled executable Python code"
In Python 2 code objects are in the
func_code
variable of function objects.1 2 3 4 5 6 7 8 9 10 11 12 13
>>> import dis # bytecode disassembler module >>> def time_2(x): ... return 2 * x ... >>> time_2.func_code <code object time_2 at 0x9ee230, file "<stdin>", line 1> >>> time_2(x=4) 8 >>> dis.dis(time_2) 0 LOAD_CONST 1 (2) 3 LOAD_FAST 0 (x) 6 BINARY_MULTIPLY 7 RETURN_VALUE
If we look at the disassembly of the function (which is pretty easy to read), we can see that the function:
LOAD
the constant(2)
,LOAD
the variable(x)
,MULTIPLY
the two values,RETURN
the result.
If we focus on the first instruction (
LOAD_CONST
) we can see the following things:LOAD_CONST
is called with an argument1
- that argument points to the const
2
The fact is that
1
is just an offset into the code object's constants tuple.1 2 3 4 5 6 7 8 9 10
>>> time_2.func_code.co_consts # tuple of constants of our code object (None, 2) >>> time_2.func_code.co_consts[1] 2 # yep we were right! # So what if we change this value ? >>> tuplecpy(time_2.func_code.co_consts, (10,), 1) >>> time_2.func_code.co_consts # tuple of constants of our code object (None, 10) >>> time_2(4) 40 # woot! It works!
So we are able to modify the constant used in the function.
Can we do the same for theLOAD_FAST
instruction ?1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
>>> time_2(4) # works on the modified function! 40 >>> time_2(x=4) # call by dict 40 >>> time_2.func_code.co_varnames # tuple of local variables and argnames ('x',) >>> tuplecpy(time_2.func_code.co_varnames, ('new_arg_name',), 0) >>> time_2(x=4) # x is not the argument name anymore! TypeError: time_2() got an unexpected keyword argument 'x' >>> time_2(new_arg_name=4) 40 >>> dis.dis(time_2) # see the function changes: 0 LOAD_CONST 1 (10) # const changed 3 LOAD_FAST 0 (new_arg_name) # arg name changed 6 BINARY_MULTIPLY 7 RETURN_VALUE
So, yeah, we have modified the behaviour of the function pretty well!
And here is a last example that I find very fun:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
>>> def nop(): pass # the function that does nothing ... >>> dis.dis(nop) 0 LOAD_CONST 0 (None) 3 RETURN_VALUE >>> nop.func_code.func_consts (None,) >>> l = [] >>> tuplecpy(nop.func_code.func_consts, (l,), 0) # the function will always return the same list! >>> nop() [] >>> l.append(2) >>> nop() [2] >>> dis.dis(nop) 0 LOAD_CONST 0 ([2]) 3 RETURN_VALUE
Function closure!
Finally, we are going to play with closure! Closure appear in function generating functions. Wikipedia page
1 2 3 4 5 6 7 8 9 10 11
>>> def gen_return_function(x): ... def f(): ... return x # in f(): x is going to be a closure ... return f ... >>> ret_42 = gen_return_function(42) >>> ret_42() 42 >>> dis.dis(ret_42) 0 LOAD_DEREF 0 (x) 3 RETURN_VALUE
This new instruction
LOAD_DEREF
is the one that handles closure. The question is: where is the closure stored?
The answer is simple : closures are inret_42.func_closure
. why not in the object code ? Because it allows all generated functions to share the same object code but with different closures!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
>>> ret_23 = gen_return_function(23) >>> ret_42 = gen_return_function(42) >>> ret_42.func_code is ret_23.func_code True >>> ret_42.func_closure (<cell at 0xa54398: int object at 0x95d748>,) # closures are always inside a cell object >>> ret_42.func_closure[0].cell_contents 42 # We are not going to rewrite the tuple but directly the contents of the cell instead. # We will still use tuplecpy but with an offset of (-1) because cell have no ob_size. >>> x = 1337 >>> tuplecpy(ret_42.func_closure[0], (x,), -1) >>> ret_42() 1337 # closure modified :)
That's all!
Conclusion
I really enjoyed messing with Python (one more time) and I hope you enjoyed it too. I think that this method is a good way to quickly and easily have a look at some Python's internals and how they work.
Lastly,
ctypes
is also a very powerful module to do legitimate work, and it is also capable of loading shared libraries and call C functions. If you have not already usedctypes
, I strongly recommand you to read the ctypes Python documentation, and give it a try!