Category Archives: codePlay

Project Detour

 

Hello, Everyone I have something kind of unrelated to my project but is still interesting. Actually it may be related to the project but is currently more of an interesting discovery (for me.) After discussing with my professor about eliminating potential overhead in my program, they brought up the idea of potentially using goto statements to select which piece of code to run. Where in this conversation the idea of getting the address of a label to use as a function call. This was something I couldn’t stop thinking about.

So this article, although somewhat unrelated will dig into some C based ways of playing with execution at a lower level. As of starting this writing I have no idea what will work and what will not but I intend to piece together different ideas’ purely to see what is possible. My first stop of course on this journey was google, where I found this article about how to extract the address from a label:

https://stackoverflow.com/questions/1777990/is-it-possible-to-store-the-address-of-a-label-in-a-variable-and-use-goto-to-jum

and

https://stackoverflow.com/questions/15702297/dynamic-jump-to-label-in-c

The second link above, actually has some interesting ideas, specifically https://stackoverflow.com/a/15736890 that may be very useful in my project; you will absolutely hear about that later; but for now let’s see what we can do with goto and function operators.

Here is the example code that I will be starting from:

   #include <stdio.h>

   int main ( void ) {
     void *ptr = NULL;
     label: ptr = &&label;
     printf(“label address %p\n”, ptr);
     return 0;
   }

 

Which itself compiles and outputs the address of the label. Neat.

Next up, what can we do with these things, well first let’s try and call that address with the function operator:

       ptr();

Error: called object ‘ptr’ is not a function or function pointer.

Dissapointing, but maybe we can try again with an array of pointers to functions:

 

               Void (*ptr_array[ ] ) = {ptr};

               Ptr_array[0]();

 

This on the other hand does run and provides us with between 30,000  and over 100,000 lines of output, just before segfaulting. Let’s dig into that a little bit with gdb. Start it up with:

               Gdb –q <name_of_program>

Note that the ‘q’ flag is used to suppress that large, unsolicited welcome message at EVERY start.

 

Continuing on and running the program within gdb provides us the opportunity to examine to call stack at the time of the segfault, you can do this with “bt” and it seems to be breaking within printf. It is difficult to make assumptions at this time as the address within ptr from label should not have changed; basically we aren’t reading any data difference, we need to investigate further.

Let’s start with break points to see what exactly is happening at each call:

1 – Pointer seems to be getting assigned to the address of the label.

2 – The printf function is executed and shows the address of the label.

3 – Ptr_array[0] is called as a function.

  • Requesting the address contained at ptr_array[0] from gdb still shows null interestingly enough.
  • This however is resolved by using ptr_array[0] = &&label;

4 – After the call to ptr_array[0]() we are somehow back at the beginning at a break point before the label.

  • Due to the fact that this was a function call and not a go to, we have presumably pushed a return address to the stack. Let’s examine that after though.

5 – And looping continues like this until we reach a segfault.

My next idea is to create a watch point on our pointers and see if the values are ever changed. We can do this in gdb with:

Watch <entity_name>

Aside from a slow run, this provides us with nothing of value, let’s remove the print statement that is slowing it down and see if we can run it again to catch the watch point on fail while the earth is still turning. I will note that this requires that the program is recompiled.

After considerable time, we finally have some output. It seems that our label pointer that is saved in ptr_array has not been modified at the time of the segfault. However if we consider our thoughts that this segfault may be related to a stack size issue (which thanks to this software portability and optimization course) we know is of a fixed size, let’s check the stack pointer:

P $esp

               -8392704

This output is very close the the size of one MebiByte, or the realistic size of an 8 MB stack. To examine further let’s see what bytecode is in that section, in gdb we can use:

               x/32x $esp

0x5555518c        0x00005555        0x5555518c        0x00005555

0x5555518c        0x00005555        0x5555518c        0x00005555

0x5555518c        0x00005555        0x5555518c        0x00005555

0x5555518c        0x00005555        0x5555518c        0x00005555

[…snip…]

This large piece of repetitive data on the stack seems to show our suspicions that we are calling a function that will never return. This means that when we call the address of the label the compiler never thinks to put in an instruction to read the address back into $EIP and pop that address off the stack. Instead each call is made over and over again which itself pushes an address to the stack and never cleaning it up. Dumping 1024, 2048 and 4096 bytes from $sp still shows the exact same data; which relates to my suspicions exactly. Let’s see if we can find a correlation in data next.

So what exactly do these hex numbers mean, lets put it together. Running our gbd examine command again however requesting ‘g’ instead of ‘x’ we will get it all put together:

x/32gx $esp

0x000055555555518c 0x000055555555518c

0x000055555555518c 0x000055555555518c

0x000055555555518c 0x000055555555518c

0x000055555555518c 0x000055555555518c

[…snip…]

Now think about what we are saving if we want to return, lets compare this against what is in our instruction pointer register:

I r

Meaning inspect registers, and from the output we can see the line:

$rip            0x55555555518a      0x55555555518a <main+81>

Which compared against

disass main

 

[…snip…]

   _[34m0x0000555555555167_[m <+46>:              lea    rax,[rip+0xfffffffffffffff9]        # 0x555555555167 <main+46>

_[34m0x000055555555516e_[m <+53>:              mov    QWORD PTR [rbp-0x18],rax

   _[34m0x0000555555555172_[m <+57>:              lea    rax,[rip+0xffffffffffffffee]        # 0x555555555167 <main+46>

   _[34m0x0000555555555179_[m <+64>:              mov    QWORD PTR [rbp-0x10],rax

_[34m0x000055555555517d_[m <+68>:              add    DWORD PTR [rbp-0x1c],0x1

   _[34m0x0000555555555181_[m <+72>:              mov    rdx,QWORD PTR [rbp-0x10]

_[34m0x0000555555555185_[m <+76>:              mov    eax,0x0

=> _[34m0x000055555555518a_[m <+81>:           call   rdx

_[34m0x000055555555518c_[m <+83>:               mov    eax,0x0

_[34m0x0000555555555191_[m <+88>:              mov    rcx,QWORD PTR [rbp-0x8]

_[34m0x0000555555555195_[m <+92>:              xor    rcx,QWORD PTR fs:0x28

_[34m0x000055555555519e_[m <+101>:            je     0x5555555551a5 <main+108>

_[34m0x00005555555551a0_[m <+103>:            call   0x555555555030 <__stack_chk_fail@plt>

_[34m0x00005555555551a5_[m <+108>:            leave

_[34m0x00005555555551a6_[m <+109>:            ret

[…snip…]

 

From what I can see, we are using lea or “load effective address” to bring in the address of an offset from $rip, which I assume to be the address of our label and storing it in rax. Note the second bolded lea call is because we are loading that same address again however into our array. That value is being stored at an address indexed from rbp meaning it is on the stack, where it is after loaded again from that offset into rdx. Finally this is where we make the call to the address loaded into rdx. Every time that call is run it pushes the address of the next (in red above) instruction to the stack; which without a return instruction explains the mess on our stack.

This creates some kind of weird failed recursion like “hyper drive”. However if there was a means in C to get that address off the stack (or never put one in the first place) this would be a fantastic and fast way to accomplish a low overhead means of calling my optimization functions. It would be especially useful if it were possible to do a non-local goto and have all the local variables still available.

 

Hope you enjoyed my quick write up, maybe you can find a practical use for it in whatever you’re doing.

 

Cheers,

ElliePenguins