Table of contents
This article is a repost, originally written by David on the Prelert’s website. The original article is now only accessible on WebArchive.
I found this article when I’m searching for the reason why Rust doesn’t have a corresponding type for
long double in C/C++, which have caused some interoperability issues (see here and here). On the contrary, the languages Zig and the newly born Carbon both support
f128 types (Zig also supports
f80 and Carbon also supports
bfloat16). But that’s not suprising because they all aim to provide max interoperability with C/C++. This article might explain some of the reason why Rust doesn’t support float types with higher precision.
C++ provides three floating point data types:
long double. All the C++11 standard says about these types is:
The type double provides at least as much precision as float, and the type long double provides at least as much precision as double.
The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double. The value representation of floating-point types is implementation-defined.
However, almost all C++ compilers are part of a family that also includes a C compiler, and Annex F of the C99 standard is more prescriptive:
- The float type matches the IEC 60559 single format.
- The double type matches the IEC 60559 double format.
- The long double type matches an IEC 60559 extended format, else a non-IEC 60559 extended format, else the IEC 60559 double format.
Since only a complete masochist would write a C++ compiler that used different types for floating point than their closely related C compiler, in practice C++ adheres to the same rules. Certainly every C++ compiler I’ve ever worked with over the last 20 years has implemented the float and double types using the single and double precision representations defined in IEC 60559 (which is the same as IEEE 754). But there is some variation in implementations of the last of these types, long double, and this can cause problems.
Throughout my career in software development I’ve run into several issues with the long double type, and these fall into the two basic categories of:
- Lack of testing
Lack of testing
At the end of last year I wrote about a problem that would fall into the first category. A bug in the x86_64 implementation of the
powl() function in glibc went unfixed for over 5 years. I suspect if the bug had been in the more widely used pow() function then more of a fuss would have been made and somebody would have fixed it sooner. Because the long double version of the function is less widely used, the bug was left to fester.
Another example of the lack of testing long double gets is a problem I ran into with the IBM xlC/C++ compiler on AIX before joining Prelert. The name (hard link) through which the compiler is invoked defines how it will behave, and when invoked using the name xlC128_r it uses a 128 bit representation for long double. At one time, even the most trivial programs compiled like this would core dump. Although the bug report shows an example calling fork(), even a simple “Hello world” program would core dump on exit if compiled with the -brtl flag! Clearly all the testing had been done on the more commonly used invocations of the compiler (where long double is not 128 bits in size).
On the portability side, some gotchas to be aware of are:
- Microsoft Visual C++ represents long double using IEEE 754 double precision – just like double (the third option permitted by C99). Therefore, making a distinction between double and long double in your code is pointless if you only ever compile with Microsoft Visual C++. But if you have to support platforms other than Windows too and use long double then you’ve built in a key difference in behaviour between the platforms that may bite you. Most other x86 compilers treat long double as being the 80 bit extended precision type as used by the x87 floating-point unit.
- On SPARC chips (OK I know they’re dying out) the long double type maps to a 128 bit representation, but, by default, compilers will generate code to do the operations in software rather than in hardware. This dates back to a time when most SPARC chips couldn’t do the operations in hardware and would request it be done in software using interrupts. Doing the 128 bit floating point operations in software unconditionally was faster than reacting to these interrupts. However, doing operations on long doubles in software is orders of magnitude slower than doing the same operations on doubles in hardware – some of our unit tests were 20 times slower when we encountered this problem, and the tests weren’t purely doing long double arithmetic. This is a case where code can be portable in terms of compiling and producing the correct results, but not in terms of having acceptable performance.
It’s instructive to look at what other portable languages do. Java has float and double types corresponding to IEEE 754’s single and double precision representations (and unlike C++ the Java standard is very explicit about how floating point operations may be implemented). Java doesn’t make a long double type available to the programmer, presumably due to the portability issues I’ve outlined (although the standard allows the x87 extended precision format to be used in intermediate calculations done by the JVM). Python just has a float type, which is “usually implemented as a double type in C”. So, if your overall system contains components written in other languages then you’ll avoid a data interchange problem by avoiding long double. The same goes if you want to store floating point numbers in a database table – for example PostgreSQL offers real and double corresponding to IEEE 754’s single and double precision representations.
A final advantage on x86 CPUs of sticking to float and double is that the compiler can then choose to do floating point calculations in the SSE unit of the CPU, which means two or four operations can potentially be done in parallel and function arguments passed in registers by 64 bit calling conventions are nicely in the SSE registers ready to be used. By contrast, long double variables can only be operated on in the x87 floating-point unit and are not passed in registers, slowing the program down.
Some might say that using long double improves the accuracy of results. This may be true, but regardless of the amount of digits a fixed precision floating point type has it will be subject to loss of significance if a poorly chosen algorithm is applied to it. Using extended precision rather than double precision may mask this effect in some cases, but in the long term the only solutions are to use algorithms more appropriate for computer calculations or to somehow detect the loss of significance and replace the answer with an appropriate value.
In my opinion, if you want to write portable C++ code that not only compiles on multiple architectures but also doesn’t have horrendous performance problems on some architectures, long double is best avoided. That’s what we do at Prelert – our C++ code doesn’t use long double and when we use Boost we define the macro BOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS so that Boost.Math doesn’t either.