James Routley

fas is floating point arithmetic for arbitrary mantissa and exponent types in modern header-only C++. It lets you construct various different float types using template parameters for the mantissa, exponent and base.

The constructed float-types look and fell like a native float/double for arithmetic operations. Furthermore all methods are performed on the stack and do not require any heap space.

fas is header-only.

Choose your desired mantissa and exponent

fas::Float<int16_t, int8_t> f;

This will result in a float using a signed 16 bit mantissa and a signed 8 bit exponent.

The arithmetical operations are supported using their corresponding operators + - * / += -= *= /= ++ --

fas::Float<int16_t, int8_t> f1 = 10;
auto f2 = f1;

f2 = -f1;     // => -10

f2 = f1 + 10; // => 20
f2 = f1 - f1; // => 0
f2 = f1 * f1; // => 100
f2 = f1 / 20; // => 0.5

f2 += 1;      // => 1.5
f2 -= 1;      // => 0.5
f2 *= 2;      // => 1
f2 /= 2;      // => 0.5

f1++;         // => 11
++f1;         // => 12
f1--;         // => 11
--f1;         // => 10

Each type knows its boundaries:

MAX() returns the largest value
MIN() returns the smallest positive value
LOWEST() returns the smallest value The naming is the same as with internal float/double.

There are overloadings for std::numeric_limits available: fas::Float<int8_t, int8_t>::MAX() returns the same as std::numeric_limits<fas::Float<std::int16_t, std::int8_t>>::MAX() which is approx 2.16079e+40.

fas supports type traits:

is_fundamental<int16_t, int16_t>::value               // => true
is_floating_point<fas::Float<int16_t, int16_t>::value // => true
is_arithmetic<int16_t, int16_t>::value                // => true
is_scalar<int16_t, int16_t>::value                    // => true
is_object<int16_t, int16_t>::value                    // => true

Everything is a constant expression

constexpr c = fas::Float<std::int16_t, std::int8_t>(1);

Even the operations' results:

constexpr c1 = fas::Float<std::int16_t, std::int8_t>(1);
constexpr c2 = fas::Float<std::int16_t, std::int8_t>(2);

constexpr c3 = c1 + c2; // => 3

fas offers an output stream overload, which can be used for std::cout:

#include "fas/stream.hpp>

...

fas::Float<int8_t, int8_t> fas_float(1);
fas_float /= 3;
std::cout << fas_float << "\n"; // => 70 * 7 ^ fffffffd ≈ 0.32653061224489793313

Typical floats such as float or double are to the base of 2. fas allows to construct floats to any base:

fas::Float<int8_t, int8_t, 7> fas_float(0);
double cpp_float = 0;

for(auto i=0; i<7; ++i) {
	fas_float += fas::Float<int8_t, int8_t, 7>(1)/7;
	cpp_float += 1.0/7;
}

std::cout << std::setprecision(20);
std::cout << fas_float << "\n";    // => exaclty 1
std::cout << cpp_float << "\n";    // => 0.99999999999999977796

The setting of different bases allows to represent specific fractions exaclty. In this case the base is 7, so any fraction by 7 is represented exactly. Compare to the native double which is always to the base of 2, thus can not represent 1/7 exactly.

Download the float.hpp and include it in your source. Don't forget to add it in your include path.

To build and run unit tests type:

mkdir build; cd build; cmake ..; make && ./tests/tests

The Stl's std::numeric_limits is required the limits of the specified types for mantissa and exponent.
Catch2 is required to build the unit tests.

(configurable) rounding support
exp, sqrt, pow and other math.h functions
string constructor
string representation
value constructor for all ints and other types
a version without stl

Show HN: Floating point arithmetic types in C++ for any size and any base

Choose your desired mantissa and exponent

Everything is a constant expression