CPSC 2310 - DAY 26 NOVEMBER 14, 2016 ================================================================================ Bias is introduced when rounding floating point numbers. rounding - choose nearest representable neighbor. To maintain accuracy in effective subtraction, most Floating Point hardware adds three bits: guard bit, round bit, and sticky bit. the extra bits also help to impelement round to even. In summary, to round to nearest even using G, R and S: 0xx - round down = do nothing (x means any bit value) 100 - this is a tie, round up if the mantissa's lsb is 1, else round down 101 - round up 110 - round up 111 - round up. IEEE FLOATING POINT FORMAT -------------------------- First floating point support was on IBM 704, each manufacturer had it's own FP format until standardization in 1980s. FORMATS: .Single Precision - 32-bit format = 1-bit sign, 8-bit exp, 23-bit frac. .Double Precision - 64-bit format = 1-bit sign, 11-bit exp, 52-bit frac. .Extended Precision - 80-bit format. .Quad Precision - 128-bit format. SPECIAL CODES: .NaN - Not a Number (Propagates itself through any operation) .Infinity (also propagates through in most cases) .Denormal numbers 171.375 Sign Exponent Fraction ------------- -------- 8 bits .Represent the whole part as binary. .Represent the fraction as a binary fraction. .Multiply by 2^0. .Normalize (only 1 non-zero digit to the left of the binary). 10101011.011 * 2^0 1.0101011011 * 2^7 0 100 0011 0010 1011 0110 0x 4 3 2 B 6 0 0 0 <- HEX REPRESENTATION IN MEMORY