Keystone JIT Experiment

I've been experimenting with building a JIT in Python, and came across Keystone Engine.

You can install the entire Keystone module using pip if you plan on only using it from within Python,

pip install keystone-engine

To write our experimental JIT, we'll need to use a few functions from the Windows API. The basic idea is that we need to allocate some memory, then copy over some assembled code into that memory and be able to make it executable.

VirtualProtect, VirtualAlloc, and VirtualFree

We can use the ctypes module to interface with these Windows API functions, It's just a matter of translating each function's respective arguments and return types into ctypes types.

import ctypes

We'll first write an error check function which are called implicitly after the function is called to verify the result of the API call. If there is an error we know that these API Functions all call SetLastError, and the ctypes.WinError exception will call GetLastError to retrieve the proper error code.

def win32_bool_ptr_errcheck(result, func, args):
    if not result:
        raise ctypes.WinError()
    return result

For VirtualProtect the signature is,

BOOL VirtualProtect(
  LPVOID lpAddress,
  SIZE_T dwSize,
  DWORD  flNewProtect,
  PDWORD lpflOldProtect
);

And we can write that in Python as,

VirtualProtect = ctypes.windll.kernel32.VirtualProtect
VirtualProtect.restype = bool
VirtualProtect.argtypes = [ctypes.c_void_p, ctypes.c_size_t,
                           ctypes.c_int, ctypes.POINTER(ctypes.c_int)]
VirtualProtect.errcheck = win32_bool_ptr_errcheck

For VirtualAlloc,

LPVOID VirtualAlloc(
  LPVOID lpAddress,
  SIZE_T dwSize,
  DWORD  flAllocationType,
  DWORD  flProtect
);

the respective Python,

VirtualAlloc = ctypes.windll.kernel32.VirtualAlloc
VirtualAlloc.restype = ctypes.c_void_p
VirtualAlloc.argtypes = [ctypes.c_void_p, ctypes.c_size_t,
                         ctypes.c_int, ctypes.c_int]
VirtualAlloc.errcheck = win32_bool_ptr_errcheck

and finally VirutalFree,

BOOL VirtualFree(
  LPVOID lpAddress,
  SIZE_T dwSize,
  DWORD  dwFreeType
);

the respective Python,

VirtualFree = ctypes.windll.kernel32.VirtualFree
VirtualFree.restype = bool
VirtualFree.argtypes = [ctypes.c_void_p, ctypes.c_size_t, ctypes.c_int]
VirtualFree.errcheck = win32_bool_ptr_errcheck

VirtualAlloc and VirtualProtect both take a memory protect flag flProtect, and VirtualAlloc takes an addition allocation type flag flAllocationType. We can represent these in Python using the enum module,

import enum

class Page(enum.IntEnum):
    EXECUTE           = 0x10
    EXECUTE_READ      = 0x20
    EXECUTE_READWRITE = 0x40
    EXECUTE_WRITECOPY = 0x80
    NOACCESS          = 0x01
    READONLY          = 0x02
    READWRITE         = 0x04
    WRITECOPY         = 0x08

class Memory(enum.IntFlag):
    COMMIT     = 0x00001000
    RESERVE    = 0x00002000
    RESET      = 0x00080000
    RESET_UNDO = 0x1000000
    DECOMMIT   = 0x00004000
    RELEASE    = 0x00008000

We'll be implementing a simple square function,

int square(int a) {
    return a * a;
}

We know that from the x64 calling convention, a function like this would assemble to,

square:
    mov rax, rcx
    imul rax, rax
    ret 0

We'll define the signature in Python for our square function like so, taking an integer and returning an integer,

SQUARE_PROC = ctypes.CFUNCTYPE(ctypes.c_int, ctypes.c_int)

Using Keystone is quite simple in Python, we just create a an instance of Keystone with the hardware architecture and the mode and then encode our assembly.

CODE = b"""
mov rax, rcx
imul rax, rax
ret 0
"""

ks = keystone.Ks(keystone.KS_ARCH_X86, keystone.KS_MODE_64)
encoding, _ = ks.asm(CODE)

Now we just need to allocate some memory for the encoding, copy over the bytes from the Keystone encoding, and then permissions to execute and cast the memory to a function pointer and we can execute it.

#Allocate space for the encoding.
memory = VirtualAlloc(None, len(encoding), Memory.COMMIT, Page.READWRITE)

#Copy over the bytes from the Keystone encoding.
ctypes.memmove(memory, bytes(encoding), len(encoding))

#Modify the permissions to allow it to be executed.
old_protect = ctypes.c_int(0)
VirtualProtect(memory, len(encoding), Page.EXECUTE, ctypes.byref(old_protect))

#Create a callable function from our signature.
square = SQUARE_PROC(memory)

#And finally, call our function.
print(square(2))

VirtualFree(memory, 0, Memory.RELEASE)

And we've created our square function at run time and can execute it.

You can view the source for this experiment here.