I was rummaging around in the source of R looking for trouble, as one does, when I came across what I believed to be a less than optimally accurate floating-point algorithm (function `R_pos_di`

in `src/main/arithemtic.c`

). Analyzing the accuracy of floating-point code is notoriously difficult and those having the required skills tend to concentrate their efforts on what are considered to be important questions. I recently discovered RangeLab, a tool that seemed to be offering painless floating-point code accuracy analysis; here was an opportunity to try it out.

Installation went as smoothly as newly released personal tools usually do (i.e., some minor manual editing of Makefiles and a couple of tweaks to the source to comment out function calls causing link errors {`mpfr_random`

and `mpfr_random2`

}).

RangeLab works by analyzing the flow of values through a program to produce the set of output values and the error bounds on those values. Input values can be specified as a range, e.g., `f = [1.0, 10.0]`

says `f`

can contain any value between `1.0`

and `10.0`

.

My first RangeLab program mimicked the behavior of the existing code in `R_pos_di`

:

n=-10;
f=[1.0, 10.0];
res = 1.0;
if n < 0,
n = -n;
f = 1 / f;
end
while n ~= 0,
if (n / 2)*2 ~= n,
res = res * f;
end
n = n / 2;
if n ~= 0,
f = f*f;
end
end |

n=-10;
f=[1.0, 10.0];
res = 1.0;
if n < 0,
n = -n;
f = 1 / f;
end
while n ~= 0,
if (n / 2)*2 ~= n,
res = res * f;
end
n = n / 2;
if n ~= 0,
f = f*f;
end
end

and told me that the possible range of values of `res`

was:

res
ans =
float64: [1.000000000000001E-10,1.000000000000000E0]
error: [-2.109423746787798E-15,2.109423746787799E-15] |

res
ans =
float64: [1.000000000000001E-10,1.000000000000000E0]
error: [-2.109423746787798E-15,2.109423746787799E-15]

Changing the code to perform the divide last, rather than first, when the exponent is negative:

n=-10;
f=[1.0, 10.0];
res = 1.0;
is_neg = 0;
if n < 0,
n = -n;
is_neg = 1
end
while n ~= 0,
if (n / 2)*2 ~= n,
res = res * f
end
n = n / 2;
if n ~= 0,
f = f*f
end
if is_neg == 1, res = 1 / res end
end |

n=-10;
f=[1.0, 10.0];
res = 1.0;
is_neg = 0;
if n < 0,
n = -n;
is_neg = 1
end
while n ~= 0,
if (n / 2)*2 ~= n,
res = res * f
end
n = n / 2;
if n ~= 0,
f = f*f
end
if is_neg == 1, res = 1 / res end
end

and the error in `res`

is now:

res
ans =
float64: [1.000000000000000E-10,1.000000000000000E0]
error: [-1.110223024625156E-16,1.110223024625157E-16] |

res
ans =
float64: [1.000000000000000E-10,1.000000000000000E0]
error: [-1.110223024625156E-16,1.110223024625157E-16]

Yea! My hunch was correct, moving the divide from first to last reduces the error in the result. I have reported this code as a bug in R and wait to see what the R team think.

Was the analysis really that painless? The Rangelab language is somewhat quirky for no obvious reason (e.g., why use `~=`

when everybody uses `!=`

these days and if conditionals must be followed by a character why not use the colon like Python does?) It would be real useful to be able to cut and paste C/C++/etc and only have to make minor changes.

I get the impression that all the effort went into getting the analysis correct and user interface stuff was a very distant second. This is the right approach to take on a research project. For some software to make the leap from interesting research idea to useful tool it is important to pay some attention to the user interface.

The current release does not deserve to be called 1.0 and unless you have an urgent need I would suggest waiting until the usability has been improved (e.g., error messages give some hint about what is wrong and a rough indication of which line the problem occurs on).

RangeLab has shown that there is simpler method of performing useful floating-point error analysis. With some usability improvements RangeLab would be an essential tool for any developer writing code involving floating-point types.

**Update:** The R team, in the form of Martin Maechler, resolved my report in just over 14 hours! The function `R_pos_di`

is not called, the `pow`

function from the C library (which takes two `double`

arguments rather than a `double`

and an `int`

) has been found to be more accurate. Martin says this usage is not less accurate even for `n=3`

, which I find surprising; I agree it should be more accurate for large values of `n`

.

`pow`

is one of the more complicated maths functions, involving finding a log, a multiply and then returning the exponent of this result. There are lots of boundary values that need to be checked and the code goes on for a while. I will wait for the usability of RangeLab to improve before attempting to compare its accuracy against the simpler algorithm for integer powers. Looking at the ~~Sun~~Oracle library sources, if both arguments have integral values the ‘integer power’ algorithm is used (with the divide performed last).

## Recent Comments