3.4 accessing information

The only two things programmer can alter are registers and memory via some particular instructions and their operands.

3.4.1 register


  1. 16 general registers
  2. 8 traditional, 8 new
  3. different naming convention

x86-64 contains 16 general registers of 64 bits named
(1): %rax through %rbp: also originate from the 8 registers of 16 bits in 8086;
(2): %r8 through %r15 : another new 8 registers

8086 provides 8 general registers named ax through bp then extends to the 32 bits registers, starting with %e like %eax, in IA32, and finally the 64 bits registers, starting with %r like %rax. Another new eight registers follow the same rules as from ax to %eax and %rax whereas the naming convention is different(%r8-%r8d-%r8w-%r8b [%r8-%15])(%rax-%eax-%ax-%al [%rax-%bp])

3.4.1 registers characteristics

  1. each register serves one particular role in a typical program.
  2. owing to the historic reasons, each 64 bits register can be treated as several pieces of isolate registers of 8,16,32 bits from low to high order.

two important points are highlighted here:

After a \(w\)-bits operation is executed,
a): w = 1 or 2, the remaining high-order 8-w/8 is left unchanged.
b): w = 4, the remaining high-order 4 bytes reset to zilch!
->In conclusion,mov instruction only updates the specific register and memory by the destination operand except in case b).

%rsp -> the pointer to the end position of the run-time stack
(review:pc -> %rip)

3.4.2 operand specifier

It is the specifier which is used to locate the exact address we want.0x86-64 supports these various operand specifiers shown below which can be classified into their parts:(1)immediate number (2)register (3)memory.

form: code representation -> actual mathematic meaning

1. immediate number

$Imm => Imm (the number itself is)

2. register

\(r_a\) => R[\(r_a\)] (the value in the particular register)

3. memory

operand description
Imm \(M[Imm]\)
(\(r_a\)) \(M[r_a]\)
Imm(\(r_a\)) \(M[r_a+Imm]\)
(\(r_a\),\(r_b\)) \(M[R[r_a]+ R[r_b]]\)
Imm(\(r_a\),\(r_b\)) \(M[R[r_a] + R[r_b]+Imm]\)
(\(r_a\),\(r_b\),s[^1]) \(M[R[r_a] + R[r_b]\times s]\)
(,\(r_a\),s) \(M[R[r_a]\times s]\)
Imm(,\(r_a\),s) \(R[r_a]\times s + Imm]\)
Imm(\(r_a\),\(r_b\),s)[^2] \(M[R[r_a] + R[r_b]\times s + Imm]\) : [s must be 1,2,4,6 and \(r_a,r_b\) must be 64 bits register.]
[^1]: scalar -> to scale the previous operand up to scalar times.
[^2]: \(r_a:\)base register \(r_b:\)index register

3.4.3 data movement instructions(focus on all the complement of mov class instructions)

mov class instructions, the most frequently used instruction, here defines the action of copying data from source to destination, and its formats are shown here:

mov s,d from source to destination ;[both have the identical length]

instructions description
movb s,d move byte from s to d
movw s,d move word from s to d
movl s,d move double words from s to d
movq s,d mov quad words from s to d: source operand is only immediate operand that fits two's-complement representation of 32 bits.
movabsq s,d move absolute quad words from s to d: source operand only is the immediate number and destination operand only registers.

*key points:x86-64 just strictly imposes a restriction that a movement instruction cannot have both operands refer to the memory locations.
[personal assumption: There is no circuit for communication between any pair of units in the main memory.]

mov with the extension of data size was introduced right now, But what if we want to implement the zero extension or sign extension scenario such as casting a char variable to int data type?[source can be register and memory whereas destination only registers,]
case 1: zero extension
case 2: sign extension

1. mov + z + suffix instruction : zero extension

movz instruction not only updates the specified register or memory bytes but also reset/fills out the remaining bit with zero.

movz[1][2] : \(R[2]=\) zero_extension( \(R[1]\) or \(M[2]\) )

instructions description
movzbw s,d move zero-extended byte to word
movzbl s,d move zero-extended byte to double words
movzbq s,d move zero-extended byte to quad words
movzwl s,d move zero-extended word to double words
movzwq s,d move zero-extended word to quad words


  1. why is there the absence of the case of casting 4 bytes of data to register of 8 bytes?
    As we said before, any 4 bytes operands lead to change the remaining bits to zero.No need to create a duplicated instruction. But less than 2 words operands left the remaining bytes unchanged.

2. mov + s + suffix instruction : sign extension

movs is similar as movz except the extension bit is the significant bit(also named as sign bit)

movs[1][2] : \(R[2]=\) sign_extension( \(R[1]\) or \(M[2]\) )

instructions description
movsbw s,d move sign-extended byte to word
movsbl s,d move sign-extended byte to double words
movsbq s,d move sign-extended byte to quad words
movswl s,d move sign-extended word to double words
movswq s,d move sign-extended word to quad words
movslq s,d move sign-extended double words to quad words
** cltq(NO OPERAND!)

3.4.4 one example of data movements

keypoint: reference and dereference

reference : &
dereference : $Imme , R[(Memory)] , %rax

3.4.5 the meaning of the left-hand and right-hand sides of one assignment

3.4.6 pushing and popping stack data

A stack is a data type where values can be added or deleted but only according to a "last-in,first-out" discipline.

The stack grows downward in the memory such that the top element of the stack has the lowest address of all stack elements.

1. top,pushq,popq

The stack can be implemented by an array, the top of the stack is the end of the array, and that's the reason why the stack grows upward.

2.A particular way to access stack

Despite the particular data type of stack, it also is implemented within the main memory which can be accessed by the standard memory addressing method.

Treat the program stack in the way we usually access an array.

3.5 arithmetic and logic operation

The operation is divided into four categories:

1)load effective address 2)unary 3)binary 4)shift

where unary instruction only has one operand but binary has two operands.

3.5.1 load effective address

instruction : leaq

why suffix is q??=> effective address in 0x86-64 is the 64-bit length.

In addition to loading the effective address, leaq can perform some direct and simple arithmetic operations such as a combination of addition and multiplication based on the standard addressing model, for example:

leaq 3(%rax,%rax,4),%rax
%rax = 4*%rax+%rax + 3 = 5*%rax+3

It appears to be the synonym of & in C language.

3.5.2 unary and binary operations

the main difference between them is the number of operands they have, unary with one operand whereas binary with two operands.

3.5.3 shift operation

The principle of shift operation applied to multiplication should be divided into logic and arithmetic operations according to the mathematical properties.

(unsigned: logic and arithmetic->zero extension)
(sign: even though the positive number is performed as same as logic, for the negative number, only arithmetic shift works -> sign extension )

One thing needs to be mentioned here: generally speaking, the value of determining the length of a shift operation is specified directly by A specific number, but is the source from the register?

the shift amount is derived from the suffix the shift expression holds if the suffix is w, and the length of a word is 16, which can be represented by 4 bits. It can not only prevent it from shifting out of the boundary but also ensure the correctness of the shift amount.

3.5.4 special arithmetic operations

Unlike addition and subtraction, multiplication and division may cause the overflow if the maximum length of operands is up to quad words with 64 bits, because the result may only is represented by a 128 bits(16 bytes) format denoted by oct.

mult and div

