Calculate the valid roundings when quantizing to 32-bit floats
TS/JS's number type is internally a f64, so quantization needs to occur when
converting to f32 for WGSL. WGSL does not specify a specific rounding mode,
so if a number is not precisely representable in 32-bits, but in the
range, there are two possible valid quantizations. If it is precisely
representable, there is only one valid quantization. This function calculates
the valid roundings and returns them in an array.
This function does not consider flushing mode, so subnormals are maintained.
The caller is responsible to flushing before and after as appropriate.
Out of bounds values need to consider how they interact with the overflow
rules.
If a value is OOB but not too far out, an implementation may choose to round
to nearest finite value or the correct infinity. This boundary is at
2^(f32.emax + 1) and -(2^(f32.emax + 1)) respectively.
Values that are at or beyond these limits must be rounded towards the
appropriate infinity.
Parameters
n: number
number to be quantized
Returns readonly number[]
all of the acceptable roundings for quantizing to 32-bits in
ascending order.
Calculate the valid roundings when quantizing to 32-bit floats
TS/JS's number type is internally a f64, so quantization needs to occur when converting to f32 for WGSL. WGSL does not specify a specific rounding mode, so if a number is not precisely representable in 32-bits, but in the range, there are two possible valid quantizations. If it is precisely representable, there is only one valid quantization. This function calculates the valid roundings and returns them in an array.
This function does not consider flushing mode, so subnormals are maintained. The caller is responsible to flushing before and after as appropriate.
Out of bounds values need to consider how they interact with the overflow rules.