SSE instructions remove performance pitfalls related to infinities and NaNs. The only remaining case where slowdowns are to be expected denormals (which can be set to flush-to-zero if desired.)
In other words: it's perfectly fine to work with infinities and NaNs in your code.
I'm a fan of these not because of the claims regarding precision, but because they drop all the complexity and baggage of IEEE floating point.
> There’s also only one infinite posit number.
Those two things are a big deal-breaker for me. Yes, having positive and negative 0 can be useful--there are times when you want to think of 0 not as "this is exactly 0" but as "this value underflowed our range", and it matters whether or not you are an underflowing negative number or an underflowing positive number. Of course, using IEEE-754 to check for exactly one of positive and negative 0 is painful.
Similarly, having NaN as a distinct type can be useful. You get to distinguish between "this computation shrank too small to be represented", "this computation grew too large to be represented", and "this computation makes no mathematical sense". Posits don't give you that. Furthermore, as many language runtimes have discovered, the sheer number of NaN values means you can represent every pointer and integer as a tagged NaN.
The only thing in IEEE-754 I would truly toss in a heartbeat is that x != x holds true for NaN values.
The guy who came up with posits also has his UNUM concept which I think is interesting but not for me. The idea is to use two numbers (posits) to track an upper and lower bound for a computation. At the end you can then see what confidence you have in the result. To me that's just wasted storage and computation, but if you want to validate some complex code it actually seems better to me than having some NaN come out the end.
I think of UNUMs and posits as two different ideas. Posits work the way I want them to, while UNUMs will provide the features you want (correctness checking) weather they use posits or IEEE floating point as the underlying representation.
If you consider the two goals of high performance and correctness of a computation, I think UNUMs and posits handle it far more elegantly and simply. IEEE doesn't actually give you both at the same time, but it pretends to.
Years ago a put together a math library (https://github.com/KimBurgess/netlinx-common-libraries/blob/...) for a domain specific language that had some "limited" capabilities. All functionality had to be achieved through a combination of some internal serialisation functions and bit twiddling.
It was simultaneously one of the most painful and interesting projects I've done.
value = (-1) sign * 2 (exponent-127) * 1.fraction
It should be:
value = (-1) sign * 2 (exponent-127) * (1 + fraction*2^-23)
It sounds trivial, but you can't reason mathematically about the first equation.
Unfortunately, that means that you have no way to represent numbers in the denormal binate, which leads to severe problems with monotonicity. As you move to binates with smaller exponents the distance between representable numbers halves in all the normalizable binates. Unless you allow for denormals, you have a GIANT jump from the smallest normalizable number to zero.
This leads to problems in numerical algorithms. Taking differences to find slopes gets unstable as you approach convergence, causing converge to fail.
"1.fraction" is nonsense that doesn't really mean anything, whereas (1 + fraction * 2^-23) does mean something.
value = (-1)^sign * 2^(exponent-127) * 1.fraction".
1*10^4 + 2*10^3 + ...
> Floats represent continuous values.
But as you probably know, this isn't possible. The concept of infinite precision is interesting in theory, but disappears when any actual calculation needs to be made, whether on a digital electric computer or not.
I wonder if this is not a flaw in the crude mechanical representation of numbers, but a flaw in the decision to base floating-point computation on the concept of continuous numbers. I believe that a better model for floating-point computational representation and manipulation would be to reflect the rules of scientific measurements - that each number includes an explicit amount of precision that is preserved during mathematical operations.
If you aren't getting what I am saying, let me give an example. Let's say, for some reason, you want to measure the diameter of a ball. You have a measuring tape so you wrap it around the widest part and record that it is 23.5cm. To calculate the radius, you should divide by π. If you do this in double-precision floating point, you will get 7.480282325319081, but this is nonsense. You can't create a result that is magically more precise than your initial measurement though division or multiplication. The correct answer is 7.48cm. This preserves the amount of precision in the least precise operand, and is arguably the most correct result.
First reason being that it's much more complex and it's unclear what the complexity buys us.
Second reason is that it doesn't model how variables co-vary. As a toy example imagine that I have a number x: 5+-1.
Then I let y = x - 1: 4+-1
Finally I let z = 1 / (x - y).
Now, by construction z will be very close to 1. But a system naively tracking uncertainties will be very concerned x - y. If it does a worst case analysis it gets 1+-2. If it does an average case analysis assuming independent gaussian errors it gets 1+-sqrt(2). When we perform the division our uncertainty goes infinite.
I am not sure if I fully understand your example, but I don't see any problem with it. Using basic significant figure rules, this is (with an additional step for clarity):
The answer seems to simply be 1+-1. "Significant Figures" are a simplification of precision where the precision is an integer that represents the total digits of the least precise measurement. A more accurate way is to represent precision as standard deviation, then calculate the precision of the result with basic statistical techniques.
x: 5e0 y = x - 1: 4e0 z1 = x - y: 1e0 z = 1/1e0 z2 = 1e0
It would be nontrivial to implement this efficiently - the simplest implementation would require a second float for each original float that has an uncertainty, doubling the time and memory requirements. A tensorflow-like system which can see the whole flow of numbers might be able to provide precision estimates efficiently only where needed.
To me it seems much more likely the simple explanation that 2^E+1.M for some number of bits is unnatural to people used to n*10^E with an unlimited number of digits.
To drive this home I don't think I ever heard anyone be confused why 2 fixed point numbers resulted in a certain value after being told what a fixed point number was.