IDA Tricks - Dealing with inlined data

published 2018-06-04

Intro

When analyzing position-independent code (i.e. shellcode or malicious code snippets), you'll frequently see something like the following:

Inlined string leading to a broken disassembly

The second call is actually not a subroutine call but a disguised push for the following data.

We can manually fix this by undefining the sub (otherwise IDA's auto-analysis will override our judgement), making it code again, jumping to the location after the call, turning it into string, fixing the following code:

Manual intervention

There you can see it's just a call-pop pair, effectively loading the string's address into [ebp-20h].

Now while this is better (although tedious to fix), switching to graph view will fail. Or rather turning the whole code back into a function will fail because of undefined instructions, and thus we also won't get our graph view back.

So not exactly a solution.

Modify the code

We know this is a push in disguise. We can also freely modify the binary in IDA. So why not just make it a proper push!

To do that, we will perform the following steps:

  1. Add a segment with the same size as the current one
  2. Copy the inlined data to the new segment to the same offset
  3. Turn the call into a push followed by a jump to the call's original location
  4. Some cosmetic stuff

All these can be done in a small script.

The reason for creating a same-sized segment is so we do not have to keep book. We just copy the inlined data to the same offset in the new storage segment instead. Not a nice solution for big targets but it works just fine for anything else, and not keeping state makes the script easy to use.

The script

The script assumes two things:

  1. Your mouse is on the call instructions so you can quickly use the script bound to a hotkey
  2. We have manually created the storage segment. Could be done in the script of course.

The proof-of-concept script then is:

def get_storage_segment():
    seg = get_first_seg()

    while seg != BADADDR and get_segm_name(seg) != "storage":
        seg = get_next_seg(seg)

    if seg != BADADDR:
        return seg
    else:
        return None

def fix_call():
    ea = get_screen_ea()

    if print_insn_mnem(ea) != 'call':
        print "Not a call instruction!"
        return

    # address of the trailing 'pop' instruction
    call_target = get_operand_value(ea,0)

    data_start = next_head(ea)
    data_len = call_target-data_start

    storage_addr = get_storage_segment()
    if not storage_addr:
        print "Error: Segment 'storage' not found"
        return

    # get offset in this segment
    offset = data_start - get_segm_attr(data_start,SEGATTR_START)
    copy_dest = storage_addr + offset

    for i in range(data_len):
        PatchByte(copy_dest+i,Byte(data_start+i))

    ida_idp.assemble(ea,0,ea,True,"push 0%08xh" % copy_dest)
    ea += get_item_size(ea)
    ida_idp.assemble(ea,0,ea,True,"jmp 0%08xh" % call_target)
    ea += get_item_size(ea)

    # Undefine the inlined data to clean up the disassembly
    del_items(ea,DELIT_SIMPLE,call_target-ea)
    # Add a name to the copied data
    MakeName(copy_dest,"inlined_%08x" % data_start)

idaapi.add_hotkey("2",fix_call)

It just does the above, calculate infos about the inlined data length and offset, addresses for the push and jmp instruction we are going to patch in, copies the data, patches the instructions and performs a bit of cleanup.

Note that this may have issues with segmentation, I think I had some odd configuration where some API call returned a full address (with respect to the segment address) and some did not but I couldn't figure out what the constellation was when writing this article.

Always, always backup your .idb before using modifying scripts like these. Even if the code performs as it does, you will find edge cases where it fails and ruin your database.

If we run the above script on the example and do minimal manual intervention (tell IDA that the push is using an offset, and that the bytes following the push are also code) we get this:

Properly now

And eventually, we can turn this into a subroutine and switch to graph view:

Fixed code in graph view

Much better!