Number System, Algorithm and Convergence (Numerical Methods)

January 19, 2018

Number System, Algorithm and Convergence (Numerical Methods)

Note that:

$$\color{white}{\begin{array}{c}\text{Decimal}&\text{Binary}&\text{Hexadecimal}\\ \hline 0&0000&0\\1&0001&1\\ 2&0010&2\\ 3&0011&3\\ 4&0100&4\\ 5&0101&5\\ 6&0110&6\\ 7&0111&7\\ 8&1000&8\\ 9&1001&9\\ 10&1010&A\\ 11&1011&B\\ 12&1100&C\\ 13&1101&D\\ 14&1110&E\\ 15&1111&F \end{array}}$$

IEEE floating point representation [$fl(x)$]

$$fl(x) = (-1)^{s} \cdot 2^{e} \cdot (1 + 0.m)$$

$s $: sign of the number ($0$ means positive number/ $1$ means negative number)

$e$: exponent

$m$: mantissa

IEEE single precision floating point means 32-bits number:

$$s | \text{8-bit } e + 127 | \text{23-bit } m $$

IEEE double precision floating point means 64-bits number:

$$s | \text{11-bit } e + 1023 | \text{52-bit } m $$

Question:

1) Convert the decimal number $-\frac{2}{13}$ to IEEE 745 single precision floating-point format and write your answer in hexadecimal form.

Solution:

$\color{red}{1}$ : Determine the sign of the number

s = 1

$\color{red}{2}$ : Find the binary of the number

-Since the number is a fraction that cannot be express into a “nice” decimal number

$\frac{2}{13} \times 2 = \color{fuchsia}{0} + \frac{4}{13}$

$\frac{4}{13} \times 2 = \color{fuchsia}{0} + \frac{8}{13}$

$\frac{8}{13} \times 2 = \color{fuchsia}{1} + \frac{3}{13}$

$\frac{3}{13} \times 2 = \color{fuchsia}{0} + \frac{6}{13}$

$\frac{6}{13} \times 2 = \color{fuchsia}{0} + \frac{12}{13}$

$\frac{12}{13} \times 2 = \color{fuchsia}{1} + \frac{11}{13}$

$\frac{11}{13} \times 2 = \color{fuchsia}{1} + \frac{9}{13}$

$\frac{9}{13} \times 2 = \color{fuchsia}{1} + \frac{5}{13}$

$\frac{5}{13} \times 2 = \color{fuchsia}{0} + \frac{10}{13}$

$\frac{10}{13} \times 2 = \color{fuchsia}{1} + \frac{7}{13}$

$\frac{7}{13} \times 2 = \color{fuchsia}{1} + \frac{1}{13}$

$\frac{1}{13} \times 2 = \color{fuchsia}{0} + \frac{2}{13}$

$\vdots$

And it will keep repeating

$\Rightarrow \frac{2}{13} = 0.001001110110…_{2} = 0.\overline{001001110110}_{2}$

$\color{red}{3}$ : Normalise the binary of the number in to the form of $(1 + 0.m)$ and find $e$

$\begin{align}0.001001110110\overline{001001110110}_{2}&=0.001001110110\overline{001001110110}_{2}\times 2^{0}\\ &=1.001110110\overline{001001110110}_{2}\times 2^{-3} \end{align}\\$

$e = -3$

$\Rightarrow m = 001110110\overline{001001110110}$

$\color{red}{4}$ : Find the 8-bit number

-Since $e=-3$

$-3 + 127 = 124$

$124 = 1111100_{2}$

$\color{red}{5}$ : Combine all the $s$, $e+127$ and $m$

$1|\text{ }\color{yellow}{0}1111100|\text{ }00111011000100111011000$

-$\color{yellow}{\text{yellow number}}$ above is added to make it 8-bit number if the bit is not enough

$\color{red}{6}$ : Find the hexadecimal form

$\begin{array}{c}\text{1011}&\text{1110}&\text{0001}&\text{1101}&\text{1000}&\text{1001}&\text{1101}&\text{1000}\\ \hline B&E&2&D&8&9&D&8\end{array}\\$

$\therefore -\frac{2}{13} = 0xBED89D8$

2) Convert the decimal number $73.171875$ to IEEE 745 double precision floating-point format.

Solution:

$\color{red}{1}$ : Determine the sign of the number
s = 0

$\color{red}{2}$ : Find the binary of the number

Integral part:

$73 = 1001001_{2}$

Decimal part:

$0.171875 \times 2 = \color{fuchsia}{0} + 0.34375$

$0.34375 \times 2 = \color{fuchsia}{0} + 0.6875$

$0.6875 \times 2 = \color{fuchsia}{1} +0.375$

$0.375 \times 2 = \color{fuchsia}{0} + 0.75$

$0.75 \times 2 = \color{fuchsia}{1} + 0.5$

$0.5 \times 2 = \color{fuchsia}{1} + 0$

$0 \times 2 = \color{fuchsia}{0} + 0$

$\vdots$

0 will keep repeating

$\Rightarrow 73.17875 = 1001001.0010110..._{2}$

$\color{red}{3}$ : Normalise the binary of the number in to the form of $(1 + 0.m)$ and find $e$
$\begin{align}1001001.0010110..._{2} &= 1001001.0010110..._{2} \times 2^{0}\\ &=1.0010010010110..._{2} \times 2^{6} \end{align}\\$
$e=6$
$\Rightarrow$ $m$ = $0010010010110...$

$\color{red}{4}$ : Find the 11-bit number
-Since $e=6$
$e + 1023 = 1029$
$1029 = 10000000101_{2}$

$\color{red}{5}$ : Combine all the $s$, $e+1023$ and $m$
$0|\text{ }10000000101|\text{ }\underbrace{0010010010110...0}_{52\text{-bit number}}$

$\color{red}{6}$ : Find the hexadecimal form
$\begin{array}{c}\text{0100}&\text{0000}&\text{0101}&\text{0010}&\text{0100}&\text{1011}&\text{0000}&\text{0000}\\ \hline 4&0&5&2&4&B&0&0\end{array}\\$

$\begin{array}{c}\text{0000}&\text{0000}&\text{0000}&\text{0000}&\text{0000}&\text{0000}&\text{0000}&\text{0000}\\ \hline 0&0&0&0&0&0&0&0\end{array}\\$

$\therefore 73.171875 = 0x40524B0000000000$

3) Convert the IEEE 32-bit floating-point number $C1280000_{16}$ to decimal

Solution:

$\color{red}{1}$ : Change the hexadecimal form into binary form
$\begin{array}{c}\text{C}&\text{1}&\text{2}&\text{8}&\text{0}&\text{0}&\text{0}&\text{0}\\ \hline 1100&0001&0010&1000&0000&0000&0000&0000\end{array}\\$

$\color{red}{2}$ : Express into $s|\text{ 8-bit } e+127|\text{ 23-bit } m$
$1|\text{ }10000010|\text{ }\underbrace{01010...0}_{23\text{-bit number}}$

$\color{red}{3}$ : Find the $s$ & $e$
$s=1$ means the decimal number is negative

$e + 127 = 10000010_{2}$
$e+127 = 130$
$e = 3$

$\color{red}{4}$ : Find the real number
$-(1.m \times 2^{e})$
$= -(1.0101_{2} \times 2^{3})$
$= -(1010.1_{2}\times 2^{0})$
$= -(1010_{2} + 0.1_{2})$
$= -(10 + 2^{-1})$
$= -10.5$

$\therefore C1280000_{16} = -10.5$

Search This Blog

MathMAD

Number System, Algorithm and Convergence (Numerical Methods)

Comments

Post a Comment

Popular Posts

Circular Measure (part 2)

Circular Measure (part 1)