Skip to content

Decompiling Tips IDA

8street edited this page Jul 16, 2021 · 10 revisions

Here are some tips you might find that come in handy while decompiling the game.

First, get a copy of IDA. You can use the free version, though it will prompt you to upgrade and you can't save the binary. That's ok because we don't really need to change the binary, just look at it.

When you open IDA, load the openrct2.exe file from this repository. You will see a large number of instructions without any information attached, and will probably want the debugging information that people have added so far. Email IntelOrca for the latest copy of the IDC file.

Once you have the IDC file, load it by clicking "File -> Load Script" and loading it.

Debugging in general

RCT2 is written in x86, which is about as close to the actual CPU instructions as you are going to get. Each of the 8 CPU registers can store 32 bits (in this game), and you can perform operations on the actual bits contained in those registers.

Hex/Decimal/Binary

Lots of numbers and addresses in the Tycoon Technical Depot are written in hex. The prefix "0x" generally denotes a hex address. The letters used for hex are a superset of those used for decimal, so numbers which look like decimals ("12") can actually refer to a different value than you think (in this case, 0x12 = 1*16 + 2 = 18). In IDA these are represented as numbers with the letter "h" as a suffix, like "1Ah".

To convert between these, I find it very convenient to keep a Python REPL up and make transformations between the types as necessary. It's also easy to view the binary representation of a number. Alternatively Calculator can be used.

# Print the decimal representation of a hex value:
>>> 0x12
18
# Convert from decimal to hex:
>>> hex(18)
'0x12'
# View binary representation of a number (works for hex or decimal)
>>> bin(0x12)
'0b10010'

Note that RCT2 uses a little endian encoding for integers that span multiple bytes, so the most significant bit is in the 2nd byte of a 16-bit integer.

Converting from x86 to C

Registers

The registers beginning with e and ending with x or i (edx, eax, ecx etc) are 32 bits each (4 bytes). The general purpose registers that are two letters ending in x (dx, ax, cx) are 16 bits each. The registers ending in "h" or "l" stand for "high" and "low" and are each 8 bits.

When converting to C, use an int type to represent a 32 bit register, a short to represent a 16 bit register, and uint8 to represent a 8 bit register.

Subroutines

The general unit of work in the x86 codebase is the subroutine. Subroutines are called like this:

call sub_6CFFFF

This will cause execution to jump to the subroutine, and the subroutine will execute until a retn value is encountered.

There are three functions in C you can use to replace a subroutine call.

  • RCT2_CALLPROC_EBPSAFE: Use this function if the subroutine does not use any registers from the calling program. For example, if the subroutine starts with a pusha instruction, this saves all of the registers from the calling routine.

      call sub_6CAB00
    

    And then in the subroutine:

      sub_6CAB00 proc near
                 pusha
    

    This generally means that the registers are saved. You may also see

      sub_6CAB00 proc near
                 push edx          ; Save this register before overwriting
                 push eax
    
                 ... Code in the function..
    
                 pop eax
                 pop edx           ; Restore the registers to their values
                 retn
    
  • RCT2_CALLPROC_X: Use this function if the subroutine starts using registers without loading any data first. For example:

      sub_6CAB00 proc near
                 add ebx, 7
    

    This subroutine begins operating in whatever value is stored in ebx, so it's safe to assume the caller has deliberately put a value there to be manipulated.

  • RCT2_CALLFUNC_X: Use this function if the subroutine stores values in registers to be used by the caller. For example

      sub_6CAB00 proc near
                 add ebx, ecx
                 retn
    

    It's safe to assume that ebx is being used by the calling program.

Pointers

You should brush up on how pointers work, if you are unfamiliar, or coming from a higher level language like Python. I haven't found a great tutorial for this yet, but here's something on pointers that might help.

In IDA you can determine the presence of a pointer by the braces around an expression. In this case this generally means the value stored in the register is an address.

add [ebp+2], 7

This means take the value stored in EBP (which should be an address like 0x579992), add two to it (0x579994), and then add 7 to the value stored at address 0x579994. Sometimes a register represents an address, and sometimes it may represent an integer like the height of a ride.

Always remember that pointers are unsigned do not try to use them as a signed integer otherwise you may end up at the wrong address.

What this means for OpenRCT2

If you see a value like this in the code:

mov bh, byte ptr word_F440AE

This roughly says, get the value at 0x00F440AE as a byte (8 bits) instead of a word (16 bits) and copy the value into the register bh.

This converts into code as

int bh = RCT2_GLOBAL(0x00F440AE, uint8);

Nullsub

If you see an instruction like this:

jz nullsub_65

This represents a call that actually existed in a version of the program but doesn't exist in the final version. In this case, if the flag is set to zero, execution will jump to the end of the subroutine (eg a return in C); otherwise it will proceed to the next instruction in the code.

imul exx, 260h

If you see a set of instructions in the code that looks like this:

movzx edx, current_ride_index
imul edx, 260h
movzx edx, rides[edx]

In C code this would be:

edx = RCT2_ADDRESS(RCT2_ADDRESS_RIDE_LIST, rct_ride)[current_ride_index];

This stores in edx the beginning of data from a ride instance. The ride instance data follows the layout described here.

In general if you see something that is multiplied by 0x260 then its quite likely that it is a ride that is being references as rides are 0x260 bytes. Sprites are 0x100 and instead of multiplied by the number it is normally left shifted by 8 (<<8). This makes it very easy to work out what a loop is iterating over.

Offset

If you see an instruction that looks like this:

add ebx, offset sprites

(where sprites is a named address in IDA, like 0x123456). This means, roughly, *add the register on the left to the value on the right, and store it in the register on the left. In this case, this would mean

ebx = ebx + RCT2_ADDRESS_SPRITE_LIST

where RCT2_ADDRESS_SPRITE_LIST is a value like 0x123456. In the binary, ebx could be any register, and offset can refer to any address in the code.

This will eventually end up like the following once we have the offset properly mapped to a C arrary

ebx = RCT2_ADDRESS_SPRITE_LIST[ebx];

Print debugging

Use printf or,

To print statements to the Visual Studio output after RCT2 begins, include the following header:

#include "windows.h"

Then use the command OutputDebugString.

OutputDebugString("Hello World!\n");

IDA Tips

  • Use the spacebar to shuffle between the graphical layout and the line-by-line instructions.
  • Press semicolon to add a comment at the end of a line.
  • Press x to show all read / write / offset / jump references to an address.
  • Press n to rename an address.
  • If you are trying to read from the binary, note every address in the code is 0x400000 higher than its physical address in the binary. So if you have an address at 0x900123 in the code and you want to read from it in an external program, start reading at 0x500123 instead.

Finding that section of code you want to edit

This is tricky, and note that the addresses in the Tycoon Technical Depot are only valid for RCT1. I would try starting with the work that's already been done and trying to branch from there to find sections of the code that are useful for you.

You can also read through the code in the OpenRCT2 project, especially the addresses in src/addresses.h, which contains a very useful list of important addresses in the game. Most of the functions in the OpenRCT2 C code list the address of the corresponding subroutine in the docstring.

Another approach is to work backwards from the strings or windows that exist in the game to the subroutines that you want to change. That is, find a string like "Too high for supports!" and try to figure out where it is used in the game, by searching for the hex representation of its ID.

Clone this wiki locally