Calculate the valid roundings when quantizing to 16-bit floats
TS/JS's number type is internally a f64, so quantization needs to occur when
converting to f16 for WGSL. WGSL does not specify a specific rounding mode,
so if a number is not precisely representable in 16-bits, but in the
range, there are two possible valid quantizations. If it is precisely
representable, there is only one valid quantization. This function calculates
the valid roundings and returns them in an array.
This function does not consider flushing mode, so subnormals are maintained.
The caller is responsible to flushing before and after as appropriate.
Out of bounds values need to consider how they interact with the overflow
rules.
If a value is OOB but not too far out, an implementation may choose to round
to nearest finite value or the correct infinity. This boundary is at
2^(f16.emax + 1) and -(2^(f16.emax + 1)) respectively.
Values that are at or beyond these limits must be rounded towards the
appropriate infinity.
Parameters
n: number
number to be quantized
Returns readonly number[]
all of the acceptable roundings for quantizing to 16-bits in
ascending order.
Calculate the valid roundings when quantizing to 16-bit floats
TS/JS's number type is internally a f64, so quantization needs to occur when converting to f16 for WGSL. WGSL does not specify a specific rounding mode, so if a number is not precisely representable in 16-bits, but in the range, there are two possible valid quantizations. If it is precisely representable, there is only one valid quantization. This function calculates the valid roundings and returns them in an array.
This function does not consider flushing mode, so subnormals are maintained. The caller is responsible to flushing before and after as appropriate.
Out of bounds values need to consider how they interact with the overflow rules.