In the first post of the series, we give a preview into the expectations from this blog series. In the second post of this blog series, we introduce some basic concepts that might be required to understand the upcoming blog posts.

Introduction

The x86_64 assembly instruction set (or simply known as x64) is an extension of the 32-bit x86 instruction set. A x86_64 CPU uses 64-bit memory addresses which allows a greater amount of virtual memory as compared to x86 CPUs. In current chips only the first 48 bits are used which means a virtual address space of 2⁴⁸ which is still greater.

The x86_64 processors can boot into the following modes:

Legacy Mode: This is a backward compatible mode in which there is no 64-bit support. Only 16-bit or 32-bit applications can be executed that require real mode.
Compatibility Mode: The 16-bit and 32-bit applications that are supported by 64-bit processors and require protected mode can run parallel to 64-bit applications.
64-bit Mode: Only the 64-bit applications can run on the 64-bit processors.

In this blog series we only discuss the x86_64 instruction set operating in 64-bit mode.

The x86_64 processors use a little endian format. This means the following two things:

Case 1: Assume that a CPU wants to read 4 bytes from the memory starting at address 0x00 and the memory is laid out as follows:

The byte at address 0x00 is 0xFF
The byte at address 0x01 is 0xC6
The byte at address 0x02 is 0x34
The byte at address 0x03 is 0x00

In a little endian architecture, the CPU interprets the byte at the highest address as the Most Significant Byte (MSB). Therefore, the 4-byte integer will be read as 0x0034C6FF.

Case 2: The similar procedure is followed when writing the 4-byte integer 0x006718FF to the memory. The CPU lays out this integer in the memory as follows:

The MSB 0x00 is written to 0x03
The second byte 0x67 is written to 0x02
The third byte 0x18 is written to 0x01
The fourth byte 0xFF is written to 0x00

Register Set

Registers are high-speed storage units that are located inside the CPU and are built to be accessed faster than the traditional memory. Following are the most important registers that you need to know for now:

General Purpose Registers

The general purpose registers (GPRs) shown in Table 1, are mainly used for arithmetic operations or for the movement of data. The registers RAX, RBX, RCX and RDX are 64-bit registers can can be further divided into 32-bit, 16-bit and 8-bit (High and Low) registers. Some registers can be only divided into 16-bit registers.

64-bit	32-bit	16-bit	8-bit (High address)	8-bit (Lower Address)
RAX	EAX	AX	AH	AL
RBX	EBX	BX	BH	BL
RCX	ECX	CX	CH	CL
RDX	EDX	DX	DH	DL
RBP	EBP	BP	–	–
RSP	ESP	SP	–	–
RSI	ESI	SI	–	–
RDI	EDI	DI	–	–

TABLE 1

Some special use cases of these registers are:

Register	Special Usage
RAX	Default register to store return value of a function and for multiplication/division operations.
RCX	Used as a loop counter
RSP	Known as the stack pointer, it points to the current top of the stack.
RSI	Contains the value of source in string/memory operations
RDI	Contains the value of destination in string/memory operations
RBP	Known as the base pointer. Allows the high-level languages to access function stack containing the parameters and the local variables

Instruction Pointer

The register RIP is known as the instruction pointer for 64-bit address space. It points to the address of the next x86_64 instruction to be executed. For 32-bit address space, EIP is supported.

RFLAGS Register

This 64-bit register represents certain flags within individual binary bits. Most of the programs only require the Direction control flag and 4 status flags: Carry, Overflow, Sign and Zero.

Bit	Label	Purpose
0	CF (Carry Flag)	set when result of unsigned arithmetic operation is too big for the destination
6	ZF (Zero Flag)	set when result is zero for an arithmetic or logical operation
7	SF (Sign Flag)	set when a negative result is obtained from arithmetic or logical operation
10	DF (Direction Flag)	Used for string processing.
11	OF (Overflow Flag)	set when result of signed arithmetic operation is too big for the destination

Data Types

The common data types used in x86_64 instruction set are as follows:

Bytes – 8 bits
Word – 16 bits
Double Word – 32 bits
Quad Word – 64 bits

Practically Understanding x86_64: Basic Concepts