Intro to x86-64
Look at x86-64 assembly using radare2
Last updated
Look at x86-64 assembly using radare2
Last updated
Radare2 is a framework for reverse engineering and analyzing binaries. It can be used to disassemble binaries(translate machine code to assembly, which is actually readable) and debug said binaries(by allowing a user to step through the execution and view the state of the program).
We first execute the program intro
running /.intro
.
From the execution we can see that the program creates two variables and switch their values.
Examining the program with radare2.
To run the program on radare we type r2 -d intro
.
This will open the binary in debugging mode. Once the binary is open, one of the first things to do is ask r2 to analyze the program with the command aa
which is the most common analysis command. It analyses all symbols and entry points in the executable.
Then run e asm.syntax=intel
to set the disassembly syntax to Intel.
In this case, the analysis involves extracting function names, flow control information and much more. r2 instructions are usually based on a single character, so it is easy to get more information about the commands. For general help, run ?
. For more specific information, for example, about analysis, run a?
.
Once the analysis is complete, we want to know where to start analyzing from - most programs have an entry point defined as main. To find a list of the functions run afl
.
Above we can see that there is a main
function. To examine the assembly code at main
we run the command pdf @main
, where pdf means print disassembly function
. Doing so will give us the following.
In the figure above, the values on the complete left column are memory addresses of the instructions, and these are usually stored in a structure called the stack. The middle column contains the instructions encoded in bytes (what is usually the machine code), and the last column contains the human readable instructions.
The core of assembly language involves using registers to do the following:
Transfer data between memory and register, and vice versa
Perform arithmetic operations on registers and data
Transfer control to other parts of the program.
Since the architecture is x86-64, the registers are 64 bit and intel has a list of 16 registers:
64 bit
32 bit
rax
eax
rbx
ebx
rcx
ecx
rdx
edx
rsi
esi
rdi
edi
rsp
esp
rbp
ebp
r8
r8d
r9
r9d
r10
r10d
r11
r11d
r12
r12d
r13
r13d
r14
r14d
r15
r15d
Even though the registers are 64 bit, meaning the can hold up to 64 bits of data, other parts of the registers can also be referenced. In this case as 32 bit values, but they can also be refereced as 16 bit and 8 bit (higher 4 bit and lower 4 bit)
64-bit ->rax ; 32- bit -> eax; 16-bit -> ax; 8-bit(higher 4 bit) -> ah; 8-bit(lower 4 bit) -> al
The first 6 registers are general purpose registers. The rsp
is the stack pointer and it points to the top of the stack which contains the most recent memory address. The stack is a data structure that manages memory for proagrams. rbp
is a frame pointer and points to the frame of the function currently being executed - every function is executed in a new frame. To move data using registers, the following instruction is used:
mov destination, source
This involves:
Transferring constants (mov rax, 3
would move the constant 3 to the rax register)
Transferring values from a register (mov rbx, rax
would move the value in rax to rbx)
Transferring values from memory which is shown by putting registers inside breackets (mov [rbx], rax
would move the value stored in rax into the memory location represented by rbx.
Some other important instructions are:
lea destination, source
: This instruction sets the destination to the address denoted by the expression in source.
add destination, source
: destination = destination + source
sub destination, source
: destination = destination - source
imul destination, source
: destination = destination * source
sal destination, source
: shift destination bits to the left
sar destination, source
: shift destination bits to the right
xor destination, source
: destination = destination XOR source
and destination, source
: destination = destination AND source
or destination, source
: destination = destination OR source
The general format of an if statement is:
If statements use 3 important instructions in assembly:
cmp source1, source2
: it is like computing a-b without setting destination (if both sources are equal it evaluates to 0 and sets the ZF to 1)
test source1, source2
: It is like computing AND without seeing destination (if both sources are equal it evaluates to 0 and sets the ZF to 1)
Jump instructions are used to transfer control to different instructions, and there are different types of jumps:
Jump Type
Description
jmp
Unconditional
je
Equal/Zero
jne
Not Equal/Not Zero
js
Negative
jns
Nonnegative
jg
Greater
jge
Greater or Equal
jl
Less
jle
Less or Equal
ja
Above(unsigned)
jb
Below(unsigned)
The last 2 values of the table refer to unsigned integers. Unsigned integers cannot be negative while signed integers represent both positive and negative values. Since the computer needs to differentiate between them, it uses different methods to interpret these values. For signed integers, it uses something called the two’s complement representation and for unsigned integers it uses normal binary calculations.
Lets analyze a program with if statements:
In the figure above we can see the main function. To analyse it we first set a break point on the jge
and the jmp
instruction using the command:
db 0x55ae52836612
(which is the hex address of the jge
instruction)
db 0x55ae52836618
(which is the hex address of the jmp
instruction)
We have added breakpoints to stop the execution of the program at those points so that we can see the state of the program.
We now run dc
to start the execution of the program and stop at the first break point. Before the first breakpoint this is what happens:
The first 2 lines push the base pointer onto the stack and save it, then give the value of the base pointer to the stack pointer.
The next 3 lines are about assigning values 3 and 4 to the local arguments/variables var_8h
and var_4h
. It then stores the value of var_8h
in the eax
register.
The cmp
instruction compares the value of eax with var_4h
.
To view the value of the registers we type dr
. Below we have the value of the registers at the beginning of the program and before hitting the breakpoint.
We can that the value in rax
is 3 when we hit the breakpoint. We see that after the compare, the instruction will jump if eax is greater than or equal to the value in var_4h
. To see what's in var_4h
, we can see in the the main function that it has the value 4 assigned to it.
So eax contains 3, and 3 is not greater than 4 which mean the jump will not occur and we will move to the next instruction. We can check this moving to the next instruction using ds
.
To answer the first question lets first analyze the main function:
There are three variables:
var_ch
that is assigned with value 0
var_8h
that is assigned with value 0x63
which is the hex value for the decimal 99
var_4h
that is assigned with the value 0x3e8
which is the hex value for the decimal 1000
var_ch
is stored in eax
, which is then compared with var_8h
. If eax
is greater than or equal to 99
(var_8h) then it jumps to some address ahead, but we know that eax is 0 so that it wont jump. Then, the value of var_8h
is stored in eax
, and it is then compared with 1000
(var_4h). Once again if it eax is greater than or equal to 1000 it jumps, so it does not jump because eax is now 99. Now we have the and
instruction, comparing the value in eax with 0x64
. To do this and
operation we can see how it works at the binary level.
0x63 = 1100011
0x64 = 1100100
1100000 = 0x60 = 96
Since this is the only operation done to var_8h
before the pop and ret instructions...
We just found the answer to the first question.
Continuing with the program's flow examination, after the bitwise and
operation, the program jumps to the address at 0x563ff0c00630
which subtracts 0x4b0
to var_4h
. This is the last instruction related to the variables before the pop and ret instruction, meaning that the value of var_ch
remained 0
.
As we already seen, the instruction at 0x563ff0c00630
subtracts 0x4b0
to var_4h
.
var_4h
= 0x3e8
= 1000
0x4b0
= 99
1000 - 99 = 1
So the value of var_4h
before the pop and ret instructions is 1
.
The symbol that represents the and
instruction is &
.
Usually two types of loops are used: for loops and while loops. The general format of while loops is:
The general format of a for loop is:
Lets analyse the following binary.
We start by setting a breakpoint at the jmp
instruction.
Doing this allows use to skip the first few lines of instructions, which as we saw using if statements, it just passing in values to local arguments
Once execution reaches the breakpoint at the jmp instruction, run ds
to move to the next instruction. Since this is an unconditional jump, it will move to the cmp instruction.
Here the cmp
instruction is trying to compare what’s in the local argument var_ch with the value 8. To see what’s in var_ch, we check the start of the disassembled function and check the memory. In this case, it is rbp-0xc
And shows that it contains 4. The next instruction is a jle
which is going to check is the value is var-ch is less than or equal to 8. Since 4 is less than 8, it will jump to the add
instruction.
The add
instruction will add 2 to the value of var-ch and continue to go to the cmp
instruction. Since 2 was added to var_ch, var_ch will now contain 6 which is still less than 8, and it will jump back to the add
instruction. This can be seeing by continuing execution using the ds
statement. We know this is a loop because the add
instruction is being executed more than once, and this is in combination with comparing the value of var_ch to 8. So we can infer the structure of the loop to be:
This questions are about the binary loop2
. Let's get an overview of what the program does analyzing its main function.
The program has three variables: var_ch, var_8h and var_4h. It first starts by assigning the valye 0x14 (decimal 20) to the var_ch, the value 0x16 (decimal 22) to var_8 and zeroes var_4h and assigns it a value of 4. Then, the program junps to a compare, comparing the value of var_4h with 0x63 (decimal 99). If the var_4h is less than or equal to 99, it jumps to the address at ...61c
, where it has an and
operation with var_ch and the value of 2. Then it moves to a sar
instruction that it will shift to the right the values of var_8h. After that, it will be assigning the value of var_4h (4) to the edx
register, and then moving it to the eax
register, adding eax and after ebx and storing the value (now 12) once again into var_4h. It then proceeds to the compare and it loops again.
In the first iteration of the loop, the variable var_8h
has the value of 0x16
or 0 0 0 1 0 1 1 0
bits. Since the sar
instruction shifts the bits to the right, the value of var_8h
in the first iteration of the loop is 0 0 0 0 1 0 1 1
(decimal 11). On the second iteration, the bits are shifted again and become 0 0 0 0 0 1 0 1
, which is is the binary of 5, the answer to the first question.
The value of var_ch at the beginning of the program is 0x14
, the hexadecimal equivalent to 20
. It is then ANDed with 2
, so lets have a look:
If we do a bitwise AND
instruction the 20
and 2
, the result will be 0
. Once 0
, in the rest of the iterations where the value of var_ch is ANDed again with 2, it will be always be 0
. This the answer to the second question.
To answer this question, we can type ds
until we get to the end of the iteration and check the value of var_8h with the command px @rbp-0x8
.
We have already seen that the value of var_ch will always be 0
after the first iteration. We can confirm that by typing px @rbp-0xc
.
This crackme's password was the normal ip number of localhost.
This crackme has a lot of code, but most of it we can ignore. What is important is that we can see that it opens a secret file in the directory, and then proceeds to reverse the other of the string in the file. If the password is the string of that file in reverse order we it prints the "Correct Password" message.