In the first post of the series, we give a preview into the expectations from this blog series. In the second post of this blog series, we introduce some basic concepts that might be required to understand the upcoming blog posts.
Introduction
The x86_64 assembly instruction set (or simply known as x64) is an extension of the 32-bit x86 instruction set. A x86_64 CPU uses 64-bit memory addresses which allows a greater amount of virtual memory as compared to x86 CPUs. In current chips only the first 48 bits are used which means a virtual address space of 248 which is still greater.
The x86_64 processors can boot into the following modes:
- Legacy Mode: This is a backward compatible mode in which there is no 64-bit support. Only 16-bit or 32-bit applications can be executed that require real mode.
- Compatibility Mode: The 16-bit and 32-bit applications that are supported by 64-bit processors and require protected mode can run parallel to 64-bit applications.
- 64-bit Mode: Only the 64-bit applications can run on the 64-bit processors.
In this blog series we only discuss the x86_64 instruction set operating in 64-bit mode.
The x86_64 processors use a little endian format. This means the following two things:
Case 1: Assume that a CPU wants to read 4 bytes from the memory starting at address 0x00 and the memory is laid out as follows:
- The byte at address 0x00 is 0xFF
- The byte at address 0x01 is 0xC6
- The byte at address 0x02 is 0x34
- The byte at address 0x03 is 0x00
In a little endian architecture, the CPU interprets the byte at the highest address as the Most Significant Byte (MSB). Therefore, the 4-byte integer will be read as 0x0034C6FF.
Case 2: The similar procedure is followed when writing the 4-byte integer 0x006718FF to the memory. The CPU lays out this integer in the memory as follows:
- The MSB 0x00 is written to 0x03
- The second byte 0x67 is written to 0x02
- The third byte 0x18 is written to 0x01
- The fourth byte 0xFF is written to 0x00
Register Set
Registers are high-speed storage units that are located inside the CPU and are built to be accessed faster than the traditional memory. Following are the most important registers that you need to know for now:
General Purpose Registers
The general purpose registers (GPRs) shown in Table 1, are mainly used for arithmetic operations or for the movement of data. The registers RAX, RBX, RCX and RDX are 64-bit registers can can be further divided into 32-bit, 16-bit and 8-bit (High and Low) registers. Some registers can be only divided into 16-bit registers.
| 64-bit | 32-bit | 16-bit | 8-bit (High address) | 8-bit (Lower Address) |
|---|---|---|---|---|
| RAX | EAX | AX | AH | AL |
| RBX | EBX | BX | BH | BL |
| RCX | ECX | CX | CH | CL |
| RDX | EDX | DX | DH | DL |
| RBP | EBP | BP | – | – |
| RSP | ESP | SP | – | – |
| RSI | ESI | SI | – | – |
| RDI | EDI | DI | – | – |
Some special use cases of these registers are:
| Register | Special Usage |
|---|---|
| RAX | Default register to store return value of a function and for multiplication/division operations. |
| RCX | Used as a loop counter |
| RSP | Known as the stack pointer, it points to the current top of the stack. |
| RSI | Contains the value of source in string/memory operations |
| RDI | Contains the value of destination in string/memory operations |
| RBP | Known as the base pointer. Allows the high-level languages to access function stack containing the parameters and the local variables |
Instruction Pointer
The register RIP is known as the instruction pointer for 64-bit address space. It points to the address of the next x86_64 instruction to be executed. For 32-bit address space, EIP is supported.
RFLAGS Register
This 64-bit register represents certain flags within individual binary bits. Most of the programs only require the Direction control flag and 4 status flags: Carry, Overflow, Sign and Zero.
| Bit | Label | Purpose |
|---|---|---|
| 0 | CF (Carry Flag) | set when result of unsigned arithmetic operation is too big for the destination |
| 6 | ZF (Zero Flag) | set when result is zero for an arithmetic or logical operation |
| 7 | SF (Sign Flag) | set when a negative result is obtained from arithmetic or logical operation |
| 10 | DF (Direction Flag) | Used for string processing. |
| 11 | OF (Overflow Flag) | set when result of signed arithmetic operation is too big for the destination |
Data Types
The common data types used in x86_64 instruction set are as follows:
- Bytes – 8 bits
- Word – 16 bits
- Double Word – 32 bits
- Quad Word – 64 bits