A few days ago, I went across a fantastic paper shared on HN. There was a section in the paper where a function apparently ran machine code placed somewhere in data memory.

static void walk_instrs(unsigned char *pos, unsigned char *end, void (*cb)(unsigned char *, unsigned, void *), void *arg)
    unsigned char *cur = pos;
    while (cur < end)
        unsigned len = instr_len(cur, end);
        cb(cur, len, arg);
        cur += (len ? len : 1);

It invokes a callback function for each executable machine-instruction set, kind of like a mini-VM (not that I know how VMs work, but you get the point). Alternatively for fun, you can just put native machine code into an array and execute as if those instructions are part of your program code.

Lets form an array which, when executed, prints "HELLO!" to the screen.

Write C, Generate ASM

To generate binary data (code?) for array, write a C program with the function that you want to execute. The following program won't work for the reasons explained below.

#include <stdio.h>

void run_this(void)

int main(void)
    return 0;

It won't work for two reasons:

  1. The function puts is a libc function which will have a different dynamic link address in your target application.
  2. The string HELLO! is placed in a string table of your program. It will be absent in your target application.

We need our code to be self-contained. In other words, it shouldn't call any library function, and shouldn't use any data which references to the .rodata section (or any section for that matter) of the target application.

puts without stdlib

We need to get rid of puts and write our own function which prints to the screen. For this we can use linux write syscall. Here you can see the implementation of this syscall. It needs fd file descriptor, buf pointer to the string, and count number of bytes to print. Linux x86_64 ABI defines which registers to use for passing these parameters to syscalls. In our case:

  • rax for syscall number (1 for write)
  • rdi for fd (1 for stdout)
  • rsi for buf (some way of pointing to "HELLO!" string)
  • rdx for count (6 characters)

Sadly, as you may notice, we are stuck with assembly to fill these registers. Here is the program with inline assembly implementing the syscall (no #include <stdio.h>):

void run_this(void)
	asm (
		"mov $1, %rax\n\t"
		"mov $1, %rdi\n\t"
		"mov $0x214f4c4c4548, %rdx\n\t"
		"push %rdx\n\t"
		"mov %rsp, %rsi\n\t"
		"mov $6, %rdx\n\t"
		"pop %rdx\n\t"

int main(void)
	return 0;

Notice that instead of giving .rodata memory address to rsi (because we can't reference .rodata in target application), we use stack to put our string data and use stack-pointer as an address.

Re-compiling and objdumping, we see the assembly with binary codes as follows:

00000000004004d6 <run_this>:
  4004d6:	55                   	push   %rbp
  4004d7:	48 89 e5             	mov    %rsp,%rbp
  4004da:	48 c7 c0 01 00 00 00 	mov    $0x1,%rax
  4004e1:	48 c7 c7 01 00 00 00 	mov    $0x1,%rdi
  4004e8:	48 ba 48 45 4c 4c 4f 	movabs $0x214f4c4c4548,%rdx
  4004ef:	21 00 00
  4004f2:	52                   	push   %rdx
  4004f3:	48 89 e6             	mov    %rsp,%rsi
  4004f6:	48 c7 c2 06 00 00 00 	mov    $0x6,%rdx
  4004fd:	0f 05                	syscall
  4004ff:	5a                   	pop    %rdx
  400500:	90                   	nop
  400501:	5d                   	pop    %rbp
  400502:	c3                   	retq

Our array would consist of the hex values in the middle.

Target application

#include <stdint.h>

uint8_t run_array[] = {0x55, 0x48, 0x89, 0xe5, 0x48, 0xc7, 0xc0,
                       0x01, 0x00, 0x00, 0x00, 0x48, 0xc7, 0xc7,
                       0x01, 0x00, 0x00, 0x00, 0x48, 0xba, 0x48,
                       0x45, 0x4c, 0x4c, 0x4f, 0x21, 0x00, 0x00,
                       0x52, 0x48, 0x89, 0xe6, 0x48, 0xc7, 0xc2,
                       0x06, 0x00, 0x00, 0x00, 0x0f, 0x05, 0x5a,
                       0x90, 0x5d, 0xc3};

int main(void)
    void (*fptr)(void) = run_array;
    return 0;

Compile it with -fno-stack-protector -zexecstack and run.