x86 is one of the most popular architecure in desktop micro-processor. In this article, I'll make a brief overview on x86 architecture, registers in x86, x86 addressing mode and x86 micro-architecture related.
Overview on x86 Architecture
x86 is a microprocessor architecture of Intel. The origin of this great giant is 16 bit Intel 8086 CPU in 1978, after that, 80286, 80386 and 80486 came out; the address width was extended to 32 bit from 80386, APU and Multiplier was first shown in 80486. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86". In the early 1990s, Pentium series, as the replacement of previous 80486, came out with supporting of 4GB memory and even 64GB memory with PAE in Pentium Pro; 64 bit support in Pentium 4; pipelining and superscalar greatly exploited the usage of CPU resource. In the middle of new millennium, Core came out with dual core inside the CPU, which means more and faster. Now, Core i series have the largest share in desktop CPU market, with hyper-threading, QPI, the 64 bit era is now comming, and x86 becomes x64 or x86_64. One thing to point out is that x64 is still IA-32 arch, IA-64 Itanium is another architecture in Intel used to be a high-end architecture but now abandoned by Intel.
Registers in x86
Registers, like variables in hardware, are fast and small storage inside the CPU, which could be used for different functionalities such as calculating, status recording and checking. In x86 architecture, there's over 100 registers, but only a small fraction is visible to programmer. The rest bulk of those are used for special purposes such as control registers; preserved for the further upgrading in instruction set extension; registers renaming in out-of-order executing in deep pipelined and superscalar CPU.
Programmer visible registers in x86 could roughly divided into the following types:
General registers (16 bit) AX BX CX DX Segment registers (16 bit) CS DS ES FS GS SS Index registers (16 bit) SI DI Pointer registers (16 bit) IP BP SP Indicator (16 bit) FLAGS
As the title says, general register are the one we use most of the time Most of the instructions perform on these registers. They all can be broken down into 16 and 8 bit registers:
64 bits : RAX RBX RCX RDX 32 bits : EAX EBX ECX EDX 16 bits : AX BX CX DX 8 bits : AH AL BH BL CH CL DH DL
The "H" and "L" suffix on the 8 bit registers stand for high byte and low byte. With this out of the way, let's see their individual main use
RAX,EAX,AX,AH,AL : Called the Accumulator register. It is used for I/O port access, arithmetic, interrupt calls, etc... RBX,EBX,BX,BH,BL : Called the Base register It is used as a base pointer for memory access Gets some interrupt return values RCX,ECX,CX,CH,CL : Called the Counter register It is used as a loop counter and for shifts Gets some interrupt values RDX,EDX,DX,DH,DL : Called the Data register It is used for I/O port access, arithmetic, some interrupt calls.
Segment registers hold the segment address of various items. They are only available in 16 values. They can only be set by a general register or special instructions. Some of them are critical for the good execution of the program and you might want to consider playing with them when you'll be ready for multi-segment programming.
CS : Holds the Code segment in which your program runs. Changing its value might make the computer hang. DS : Holds the Data segment that your program accesses. Changing its value might give erronous data. ES,FS,GS : These are extra segment registers available for far pointer addressing like video memory and such. SS : Holds the Stack segment your program uses. Sometimes has the same value as DS. Changing its value can give unpredictable results, mostly data related.
Index and Pointer Registers
Indexes and pointer and the offset part of and address. They have various uses but each register has a specific function. They some time used with a segment register to point to far address (in a 1Mb range). The register with an "E" prefix can only be used in protected mode.
ES:EDI EDI DI : Destination index register Used for string, memory array copying and setting and for far pointer addressing with ES DS:ESI EDI SI : Source index register Used for string and memory array copying SS:EBP EBP BP : Stack Base pointer register Holds the base address of the stack SS:ESP ESP SP : Stack pointer register Holds the top address of the stack CS:EIP EIP IP : Index Pointer Holds the offset of the next instruction It can only be read.
The EFLAGS register
The EFLAGS register hold the state of the processor. It is modified by many intructions and is used for comparing some parameters, conditional loops and conditionnal jumps. Each bit holds the state of specific parameter of the last instruction. Here is a listing :
Bit Label Desciption --------------------------- 0 CF Carry flag 2 PF Parity flag 4 AF Auxiliary carry flag 6 ZF Zero flag 7 SF Sign flag 8 TF Trap flag 9 IF Interrupt enable flag 10 DF Direction flag 11 OF Overflow flag 12-13 IOPL I/O Priviledge level 14 NT Nested task flag 16 RF Resume flag 17 VM Virtual 8086 mode flag 18 AC Alignment check flag (486+) 19 VIF Virutal interrupt flag 20 VIP Virtual interrupt pending flag 21 ID ID flag
Intel and AT&T assembly format
In DOS and Windows, x86 assembly is in Intel format, while in UNIX-based OS, x86 assembly is in AT&T format. There're some significant difference between Intel and AT&T assembly format:
In Intel format: Source operand is on the right, destination operand is on the left; There's no symbol before a register; A number in the position of an operand is an immediate number; A number with bracket in the position of an operand means the number in the bracket is a memory address; Registers with bracket means indirect addressing.
MOV EAX, 5 ;immediate number MOV EAX, [1234H] ;direct memory addressing MOV EAX, [EBP-8] ;register indirect addressing MOV EAX, [EBX*4 + 1234H] ;register indirect addressing MOV EAX, [EDX + EBX*4 + 8] ;register indirect addressing
In AT&T format: Source operand is on the left, destination operand is on the right; A register is started with '%'; A number starting with '$' in the position of an operand is an immediate number; A number in the position of an operand means this number is a memory address. Registers with parentheses means indirect addressing.
movl $0x05, %eax #immediate number movl 0x1234, %eax #direct memory addressing movl -8(%ebp), %eax #register indirect addressing movl 0x1234(,%ebx,4), %eax #register indirect addressing movl 0x8(%edx,%ebx,4), %eax #register indirect addressing
There's an interesting history about the emergence of AT&T standard. In my opinion, I think Intel format is better because it looks clear and more readable than AT&T format. However, in UNIX-based OS, we have to know the AT&T format because in GDB tools when debugging a program, everything you see is GAS assembly. Thus, in most of my articles, x86 assembly will be in AT&T format. But in the following section of this article, I'm gonna use Intel format because it looks clear.
x86 Addressing Mode
In immediate addressing, an immediate number is used as an operand in an instruction.
ADD EAX, 3 ;EAX = EAX + 3 MOV AH, 00 ;AH = 00 PUSH 5000H ;push immediate number 5000H in stack
In register addressing, the operand is a register, in which stores the value that the instruction need.
INC BX ;BX = BX + 1 ADD EAX, EDX ;EAX = EAX + EDX MOV AH, BH ;AH = BH
Direct Memory Addressing
In x86, there's a mechanism called segmentation in memory addressing. The logical memory address is in the form of Segment : Offset. The value of segment is in segment registers. And it's easy put a logical memory address to a linear memory address: Segment*16 + Offset. The processor takes DS as its default segment. If we want to change segment, we need to put a "segment override prefix" before the memory address. There's more funny things in segmentatin of x86 covered in another article of mine.
In direct memory addressing, operand is a number with bracket, which means the number in the bracket is a memory address, the value in this address is what the instruction need. Remember, in an instruction, it's wrong for both two operands to be in direct memory addressing. Considering the following case, assuming DS = 5000H, ES = 1000H
MOV ES:[1234H], BX ;move the value in BX to memory address 1000H*16 + 1234H = 11234H ADD AX, [1234H] ;move the value in memory address 5000H*16 + 1234H = 51234H to AX
Base (Register Indirect) Memory Addressing
In direct memory addressing, operand is a register with bracket, which means the value in this register is a memory address, the value in this address is what the instruction need. Considering the following case, assuming SI = 1234H, BP = 5678H, DS = 5000H, SS = 2000H
MOV BX, [SI] ;move the value in memory address 5000H*16 + 1234H = 51234H to BX ADD AX, SS:[BP] ;move the value in memory address 2000H*16 + 5678H = 25678H to BX
Base or Index Plus Displacement Addressing
In base or index plus displacement addressing, operand is a register plusing an offset with bracket, which means the value in this operand is a memory address, the value in this address is what the instruction need.
Considering the following case, assuming SI = 1234H, DS = 5000H
MOV BX, [SI+4] ;move the value in memory address 5000H*16 + 1234H + 4 = 51238H to BX
Base and Index Plus Displacement Addressing
In base or index plus displacement addressing, operand is a base register plusing an indexing register (may be multiplied by a scale factor) an offset with bracket, which means the value in this operand is a memory address, the value in this address is what the instruction need.
Considering the following case, assuming BX = 1234H, SI = 54H, DS = 5000H
MOV AX, [BX+SI*2+4] ;move the value in memory address 5000H*16 + 1234H + 54H*2 + 4 = 51346H to BX
The Instruction Set Architecture (ISA) is quite important for an processor. It hides low level stuffs in circuit level and provides interfaces to develop Operating Systems. x86 is an excellent architectures, for programmers, it's quite important to know the addressing modes, which is stressed in this article. In the next following articles, I'm gonna discuss more on x86 architecture.