Skip to content Skip to sidebar Skip to footer

Convert A Number Into A 16-bit Float (stored As Bytes) And Back?

For (lossy) compression purposes, I'd like to be able to convert Javascript numbers into 16-bit float representation to be stored in Uint16Arrays (or Uint8Arrays, whichever's easie

Solution 1:

Encoding

The first step is to extract the exponent and normalized fraction of the number. In C, this is done using the frexp function which isn't available in JavaScript. Googling for frexp javascript yields a couple of implementations.

For example, here's a straight-forward implementation that extracts the bits directly from the IEEE representation using typed arrays. Here's a quick-and-dirty, inexact version using only Math functions. Here's an exact (?) version.

The second step is to create your 16-bit FP number from the obtained exponent and mantissa using bit operations. Make sure to check the exponent range and to round the lower-precision mantissa for better accuracy.

Decoding

Extract exponent and mantissa from your 16-bit FP number. Convert them to a JavaScript number either with

// The value of 'adjust' depends on the size of the mantissa.
Math.pow(2, exponent - adjust) * mantissa

or by directly creating an IEEE bit pattern with typed arrays.

Subnormal numbers, infinity, NaN

Subnormal JavaScript numbers can be simply rounded to zero. You'll have to decide whether you want to support subnormal numbers in your 16-bit format. This will complicate the conversion process in exchange for better accuracy of numbers near zero.

You'll also have to decide whether to support infinity and NaN in your format. These values could be handled similarly to the IEEE format.

Post a Comment for "Convert A Number Into A 16-bit Float (stored As Bytes) And Back?"