Number System, Algorithm and Convergence (Numerical Methods)
Note
that:
$$\color{white}{\begin{array}{c}\text{Decimal}&\text{Binary}&\text{Hexadecimal}\\ \hline 0&0000&0\\1&0001&1\\ 2&0010&2\\ 3&0011&3\\ 4&0100&4\\ 5&0101&5\\ 6&0110&6\\ 7&0111&7\\ 8&1000&8\\ 9&1001&9\\ 10&1010&A\\ 11&1011&B\\ 12&1100&C\\ 13&1101&D\\ 14&1110&E\\ 15&1111&F \end{array}}$$
IEEE
floating point representation [$fl(x)$]
$$fl(x)
= (-1)^{s} \cdot 2^{e} \cdot (1 + 0.m)$$
$s $:
sign of the number ($0$ means positive number/ $1$ means negative number)
$e$:
exponent
$m$:
mantissa
IEEE
single precision floating point means 32-bits number:
$$s | \text{8-bit
} e + 127 | \text{23-bit } m $$
IEEE
double precision floating point means 64-bits number:
$$s | \text{11-bit
} e + 1023 | \text{52-bit } m $$
Question:
1)
Convert the decimal number $-\frac{2}{13}$ to IEEE 745 single precision
floating-point format and write your answer in hexadecimal form.
Solution:
$\color{red}{1}$ : Determine the sign of the number
s = 1
$\color{red}{2}$ : Find the binary of the number
-Since
the number is a fraction that cannot be express into a “nice” decimal number
$\frac{2}{13}
\times 2 = \color{fuchsia}{0} + \frac{4}{13}$
$\frac{4}{13}
\times 2 = \color{fuchsia}{0} + \frac{8}{13}$
$\frac{8}{13}
\times 2 = \color{fuchsia}{1} + \frac{3}{13}$
$\frac{3}{13}
\times 2 = \color{fuchsia}{0} + \frac{6}{13}$
$\frac{6}{13}
\times 2 = \color{fuchsia}{0} + \frac{12}{13}$
$\frac{12}{13}
\times 2 = \color{fuchsia}{1} + \frac{11}{13}$
$\frac{11}{13}
\times 2 = \color{fuchsia}{1} + \frac{9}{13}$
$\frac{9}{13}
\times 2 = \color{fuchsia}{1} + \frac{5}{13}$
$\frac{5}{13}
\times 2 = \color{fuchsia}{0} + \frac{10}{13}$
$\frac{10}{13}
\times 2 = \color{fuchsia}{1} + \frac{7}{13}$
$\frac{7}{13}
\times 2 = \color{fuchsia}{1} + \frac{1}{13}$
$\frac{1}{13}
\times 2 = \color{fuchsia}{0} + \frac{2}{13}$
$\vdots$
And it
will keep repeating
$\Rightarrow
\frac{2}{13} = 0.001001110110…_{2} = 0.\overline{001001110110}_{2}$
$\color{red}{3}$ : Normalise the binary of the number in to the form of $(1 + 0.m)$ and find $e$
$\begin{align}0.001001110110\overline{001001110110}_{2}&=0.001001110110\overline{001001110110}_{2}\times 2^{0}\\ &=1.001110110\overline{001001110110}_{2}\times 2^{-3} \end{align}\\$
$e = -3$
$\Rightarrow
m = 001110110\overline{001001110110}$
$\color{red}{4}$ : Find the 8-bit number
-Since $e=-3$
$-3 +
127 = 124$
$124 =
1111100_{2}$
$\color{red}{5}$ : Combine all the $s$, $e+127$ and $m$
$1|\text{
}\color{yellow}{0}1111100|\text{ }00111011000100111011000$
-$\color{yellow}{\text{yellow
number}}$ above is added to make it 8-bit number if the bit is not enough
$\color{red}{6}$
: Find the hexadecimal form
$\begin{array}{c}\text{1011}&\text{1110}&\text{0001}&\text{1101}&\text{1000}&\text{1001}&\text{1101}&\text{1000}\\ \hline B&E&2&D&8&9&D&8\end{array}\\$
$\therefore -\frac{2}{13} = 0xBED89D8$
2) Convert the decimal number $73.171875$ to IEEE 745 double precision floating-point format.
Solution:
$\color{red}{3}$ : Normalise the binary of the number in to the form of $(1 + 0.m)$ and find $e$
$\begin{align}1001001.0010110..._{2} &= 1001001.0010110..._{2} \times 2^{0}\\ &=1.0010010010110..._{2} \times 2^{6} \end{align}\\$
$e=6$
$\Rightarrow$ $m$ = $0010010010110...$
$\color{red}{4}$ : Find the 11-bit number
-Since $e=6$
$e + 1023 = 1029$
$1029 = 10000000101_{2}$
$\color{red}{5}$ : Combine all the $s$, $e+1023$ and $m$
$0|\text{ }10000000101|\text{ }\underbrace{0010010010110...0}_{52\text{-bit number}}$
$\color{red}{6}$ : Find the hexadecimal form
$\begin{array}{c}\text{0100}&\text{0000}&\text{0101}&\text{0010}&\text{0100}&\text{1011}&\text{0000}&\text{0000}\\ \hline 4&0&5&2&4&B&0&0\end{array}\\$
$\begin{array}{c}\text{0000}&\text{0000}&\text{0000}&\text{0000}&\text{0000}&\text{0000}&\text{0000}&\text{0000}\\ \hline 0&0&0&0&0&0&0&0\end{array}\\$
$\therefore 73.171875 = 0x40524B0000000000$
3) Convert the IEEE 32-bit floating-point number $C1280000_{16}$ to decimal
Solution:
$\color{red}{1}$ : Change the hexadecimal form into binary form
$\begin{array}{c}\text{C}&\text{1}&\text{2}&\text{8}&\text{0}&\text{0}&\text{0}&\text{0}\\ \hline 1100&0001&0010&1000&0000&0000&0000&0000\end{array}\\$
$\color{red}{2}$ : Express into $s|\text{ 8-bit } e+127|\text{ 23-bit } m$
$1|\text{ }10000010|\text{ }\underbrace{01010...0}_{23\text{-bit number}}$
$\color{red}{3}$ : Find the $s$ & $e$
$s=1$ means the decimal number is negative
$e + 127 = 10000010_{2}$
$e+127 = 130$
$e = 3$
$\color{red}{4}$ : Find the real number
$-(1.m \times 2^{e})$
$= -(1.0101_{2} \times 2^{3})$
$= -(1010.1_{2}\times 2^{0})$
$= -(1010_{2} + 0.1_{2})$
$= -(10 + 2^{-1})$
$= -10.5$
$\therefore C1280000_{16} = -10.5$
2) Convert the decimal number $73.171875$ to IEEE 745 double precision floating-point format.
Solution:
$\color{red}{1}$ : Determine the sign of the number
s = 0
s = 0
$\color{red}{2}$ : Find the binary of
the number
Integral part:
$73 = 1001001_{2}$
Decimal part:
$0.171875 \times 2 = \color{fuchsia}{0}
+ 0.34375$
$0.34375 \times 2 = \color{fuchsia}{0} +
0.6875$
$0.6875 \times 2 = \color{fuchsia}{1} +0.375$
$0.375 \times 2 = \color{fuchsia}{0} +
0.75$
$0.75 \times 2 = \color{fuchsia}{1} +
0.5$
$0.5 \times 2 = \color{fuchsia}{1} + 0$
$0 \times 2 = \color{fuchsia}{0} + 0$
$\vdots$
0 will keep repeating
$\Rightarrow 73.17875 = 1001001.0010110..._{2}$
$\color{red}{3}$ : Normalise the binary of the number in to the form of $(1 + 0.m)$ and find $e$
$\begin{align}1001001.0010110..._{2} &= 1001001.0010110..._{2} \times 2^{0}\\ &=1.0010010010110..._{2} \times 2^{6} \end{align}\\$
$e=6$
$\Rightarrow$ $m$ = $0010010010110...$
$\color{red}{4}$ : Find the 11-bit number
-Since $e=6$
$e + 1023 = 1029$
$1029 = 10000000101_{2}$
$\color{red}{5}$ : Combine all the $s$, $e+1023$ and $m$
$0|\text{ }10000000101|\text{ }\underbrace{0010010010110...0}_{52\text{-bit number}}$
$\color{red}{6}$ : Find the hexadecimal form
$\begin{array}{c}\text{0100}&\text{0000}&\text{0101}&\text{0010}&\text{0100}&\text{1011}&\text{0000}&\text{0000}\\ \hline 4&0&5&2&4&B&0&0\end{array}\\$
$\begin{array}{c}\text{0000}&\text{0000}&\text{0000}&\text{0000}&\text{0000}&\text{0000}&\text{0000}&\text{0000}\\ \hline 0&0&0&0&0&0&0&0\end{array}\\$
$\therefore 73.171875 = 0x40524B0000000000$
3) Convert the IEEE 32-bit floating-point number $C1280000_{16}$ to decimal
Solution:
$\color{red}{1}$ : Change the hexadecimal form into binary form
$\begin{array}{c}\text{C}&\text{1}&\text{2}&\text{8}&\text{0}&\text{0}&\text{0}&\text{0}\\ \hline 1100&0001&0010&1000&0000&0000&0000&0000\end{array}\\$
$1|\text{ }10000010|\text{ }\underbrace{01010...0}_{23\text{-bit number}}$
$\color{red}{3}$ : Find the $s$ & $e$
$s=1$ means the decimal number is negative
$e + 127 = 10000010_{2}$
$e+127 = 130$
$e = 3$
$\color{red}{4}$ : Find the real number
$-(1.m \times 2^{e})$
$= -(1.0101_{2} \times 2^{3})$
$= -(1010.1_{2}\times 2^{0})$
$= -(1010_{2} + 0.1_{2})$
$= -(10 + 2^{-1})$
$= -10.5$
$\therefore C1280000_{16} = -10.5$
Comments
Post a Comment