CS:APP--Chapter02 : representing and manipulating information
标签(空格分隔): CS:APP
目录- CS:APP--Chapter02 : representing and manipulating information
Most computers now store and process information with binary/two-valued singals/data.
For the representation of information/numbers,we consider Three most important representations of numbers:
- unsigned
- two's-complement
- float-point
succeeding by the details of arithmetic operation in which we indeed understand why our codes can generate some peculiar results and learn to optimize our codes.
1.information storage
First of all,byte,the smallest addressable unit of memory,consists of 8 bits,and main memory(later virtually described by virtual address space) are treated as an array of bytes (like a linear list).
The softwares running on a computer can be treated as one program,so all the information we need to consider are the program objects among these softwares.
->It simply treats each program object as a block of bytes and program as a sequence of bytes.
Then
->Each byte in main memory is labelled by a unique number,known as its address.
keyword:[1.virtual address space 2.byte:8bits 3.bit->addressed along virtual address space]
1.1 hexadecimal notation
Even if computers treat data as a run of binary numbers, it is still so tedious to understand these 0 and 1 bunches by humankind.
Hexadecimal can be handily converted into binary, vice-versa.
There are some special rules for mapping hexadecimal to decimal and binary respectively.
(For 0-9, hexadecimal is same as decimal does in terms of the representation and mapping to binaray.)
hexadecimal | decimal | binary |
---|---|---|
A | 10 | 1010 |
B | 11 | 1011 |
C | 12 | 1100 |
D | 13 | 1101 |
E | 14 | 1110 |
F | 15 | 1111 |
Conversely, converting a string of binary to hexadecimal is to divide it into groups of 4 bits each. Then if the length of the binary is not a multiple of 4 resulting in the absence of the beginning elements of the leftmost group, just padding the number with leading zeros. (zero extension:perserve itself value.)
1.1.1 hexadecimal representation and addition
0x1111
0x1234
Technique: preceding the hexadecimal by 0x
Addition:
carry it as same as a decimal does! add 1 to the next high-order bit.
keywords:[1.hexadecimal 2.hexadecimal representation 3,conversion between H and B]
1.2 data sizes
1.2.1 word and the range of virtual address memory
As for the most important parameter in a computer,word size defines how far our computer can reach our computer. That mean, if one computer defines a word as w bits, its virtual address automatically ranges from 0 to \(2^w-1\) with the capacity of \(2^w\) bytes.
As a result of different word sizes, programs always are labelled by 32-bits or 64-bits.64 bits computer can execute both programs but 32-bits computer is compatible with only 32-bit program.
1.2.2 multiple data sizes of number
computer and compilers supports various data size to effectively represent integer and float.
often data size depends on how the program is compiled.So let's shed light on the data size defined in C languagae.
(C only define the minimum size of each data type.)
Different size on different computer may result in bug because of compatibility of data type.
In oorder to standardize data size,C also provides fix-sized data type denoted as int32_t,int64_t and so on.
1.2.3 addressing and byte ordering(寻址和字节排序)
Like the same two questions when we is confronted to how to restore data we fetch and is written into memory:
- where is these data? ----addressing
- how we will order the bytes ?(my way:how to find them correctly?) -- byte ordering
a.addressing
Generally speaking,a multi-bytes object occupies a block of continuous units/bytes in virtual address space,with the address of the objects given by the smallest address of the unit/bytes used.
b.byte ordering
There are two conventions named as big endian && little endian.
Given a bit-vector of length \(w,[x_{w-1},x_{w-2},...,x_0]\)
1.little endian:restore this number in memory ordered from the lest significant byte to the least.
2.big endian:restore this number in memory ordered from the most significant bytes to the least.
One problem:
byte ordering becomes an issue:
- data communicated over the network between the little-endian and big-endian devices.
- ?
- ?
1.3 logic algebra
It's hard to imagine how the binary can perform the function such addition,division,multiplication,subtraction and so on.
->A rich body of mathematical knowledge has evolved around the study of the value 1 and 0.
There are four basic operations and truth tables in logic algebra:
1.\(\neg\)
\(\neg\) | 0 | 1 |
---|---|---|
- | 1 | 0 |
2.\(\&\)
\(\&\) | 0 | 1 |
---|---|---|
0 | 0 | 0 |
1 | 0 | 1 |
3.\(|\)
|\(|\)|0|1|
|--|--|--|
|0|0|1|
|1|1|1|
4.^
^ | 0 | 1 |
---|---|---|
0 | 0 | 1 |
1 | 1 | 0 |
1.4 bit-level operations in C
| & ~
Apart from logic operation bit by bit,mask operation is always used in these two operation:
1.preserve some pattern
perserve the three high-order bits
&:1110000...0000
|:0001111...1111
2.extract some pattern
1.5 logic operation in C
&& || !
1.6 shift operation in C
>>
<<
2. integer representation
Now we move on to the representation of integers which is divided into two types:
- unsigned integer
- sign(two's-complement) integer
2.1 unsigned integer
assume \(\vec{x}\) is a vector of bits beginning from right with the index of 0 to the left with the index w-1.,shown as \([x_{w-1},x_{w-2},...,x_1,x_0]\).
\(B2U_w(\vec{x})=\sum^{i}_{i=0}x_i2^i\)
\(U_{max}=\frac{1(1-2^n)}{1-2}=2^n-1\)
We will see later what I have recorded here can be closely related to the its mathematical properties and its real-lift implementation in assembly language.
2.2 signed integer
For non-negative number encoded by unsigned way,then Two's-complement way can lead to the encoding of zero,negative and positive number within a more narrow range in comparsion with unsigned encoding with vecttor x of the same length w.
\(B2T_w(\vec{x})=(-1)\cdot x_{w-1} +\sum^{w-2}_{i=0}x_i2^i\)
\(TMax_w=2^{w-1}-1\)
\(TMin_w=-2^{w-1}\)
The range of positve number extends one more number than the range of negative number in one two's complement.
\(TMax_w+TMin_w=1\)
2.3 the conversion between signed and unsigned number
The problem can be separated into three parts:
1): both are positive number
2): two's complement is for a negative number
3): positve or negative zero
case 1: there is no any difference between each bit representation so \(B2T_w(\vec{x})=B2U_w(\vec{x})\),case 1 is only valid when number is in the range from 0 to \(TMax_w\)
case 2:
2.1 Unsign->signed
a unsigned number is greater than \(TMax_w\) then conveted into signed number,For a bit vector \(\vec{x}\)
\(U2T_w(\vec{x})=B2T_w(U2B(\vec{x}))=x-2^w\)
2.2 signed->unsigned
\(T2U_w(\vec{x})=x+2^w\)
case 3:
positive and negative zero ~~~~ not understand yet!
All of these three cases can be conclude as one formula:
\[U2T_W(\vec{x})= x - x_{w-1}\cdot 2^w $$\]T2U_w(\vec{x})=x+x_{w-1}\cdot 2^w
\[ ### 2.4 expanding and shrinking the bit-level representation of a integer #### 2.4.1 expanding the bit represenation its mathematical properties tells us that ,whatever the bit representation varies ,the number itself would perserve always. "Expanding" here means making more high-order bits available in the bit represenation.for example,[10]->[0010].So one question comes here soon,what criteria can we depend on to make sure the expanding bits? ##### two genrics of expansion 1. zero expension : stretch to left more with all bit zilch 0. 2. sign expansion : stretch to left more with all sign bit(the significant bit) the later section of assembly directives will uncover the practical implmentation on the hardware. we already mentioned the match between its mathematical properties and its encoding,let make the detail illustate more for us ! #### expansion of an unsigned number by zero extension we suppose here a vector x with the expension by zeo extension: $B2U_{w+i}(\vec{x})=B2U_w(\vec{x})$ #### expansion of a signed number by sign extension $B2T_{w+i}(\vec{x})=(-1)\cdot x_{w+i-1}+\sum^{w+i-2}_{i=0}=B2T_w(\vec{x})$ sign expansion can fulll maintain and perserve the original value itself. ### 2.5 truncate integer Shoterening w bits down to w-i bits raises the question of what we should do when truncating a number? modular operation makes it by dividing by the n-th power to 2. ### 2.6 intuitive behavior - implicitly cast signed number to unsigned number implicitly cast signed number to unsigned number when c handles an expression with either operand is an unsigned number. ## 3. integer arithmetic The nature of finite computer resources may result in some peculiar outcomes like positive addition producing a negative number. Here we mostly get used to starting with the simplest part : unsigned arithmetic then moving on to the more difficult parts of signed parts. **Notes:** Overflow is a significant problem that the capacity of the data type cannot hold the number to be restored here. ### 3.1 unsigned addition principle : $x+^u_w y = \left\{ \begin{aligned} x+y,x+y\leq UMax_w\\ x+y-2^w,x+y>UMax_w\\ \end{aligned} \right.$ Question: no idea on the underlying principle of integer addition on computer. ### 3.2 unsigned negation $$-^u_wx= \left\{ \begin{aligned} x,x=0\\ 2^w-x,x>0 \end{aligned} \right.\]