Approximating sin(x) to 5 ULP with Chebyshev polynomials

By Colin Wallace
May 11, 2017 (last revised May 13, 2017)
fn sine(x: f32) -> f32 {
    let coeffs = [
        -0.10132118f32,          // x
         0.0066208798f32,        // x^3
        -0.00017350505f32,       // x^5
         0.0000025222919f32,     // x^7
        -0.000000023317787f32,   // x^9
         0.00000000013291342f32, // x^11
    ];
    let pi_major = 3.1415927f32;
    let pi_minor = -0.00000008742278f32;
    let x2 = x*x;
    let p11 = coeffs[5];
    let p9  = p11*x2 + coeffs[4];
    let p7  = p9*x2  + coeffs[3];
    let p5  = p7*x2  + coeffs[2];
    let p3  = p5*x2  + coeffs[1];
    let p1  = p3*x2  + coeffs[0];
    (x - pi_major - pi_minor) *
    (x + pi_major + pi_minor) * p1 * x
}

That code above will compute the sine of any IEEE single-precision float in the range of Equation left-parenthesis negative pi comma pi right-parenthesis to within 4.58 Units of Least Precision (ULP). A ULP is a useful way to measure relative error in a floating point computation. Given two adjacent floating point numbers, their difference in terms of ULP is essentially equivalent to the number of floats that lie between them (plus one).

Given that IEEE single-precision floats have 24 bits of precision, this sine function is accurate to within 0.000027% (i.e. Equation 4.58 slash 2 Superscript 24 ). Achieving this precision over a domain this large was a challenge for me. Many polynomial approximations to Equation sine left-parenthesis x right-parenthesis operate only over Equation left-parenthesis negative pi slash 4 comma pi slash 4 right-parenthesis or Equation left-parenthesis negative pi slash 2 comma pi slash 2 right-parenthesis and use special properties of the sine function to reduce their arguments into this range (e.g. Equation sine left-parenthesis x plus pi right-parenthesis equals minus sine left-parenthesis x right-parenthesis to force arguments into Equation left-parenthesis negative pi slash 2 comma pi slash 2 right-parenthesis ). To work over larger ranges without reduction (e.g. if the input is known to be bounded by Equation left-parenthesis negative pi comma pi right-parenthesis ), these methods require modifications that I had not seen published before, so the remainder of this article will discuss my approach to deriving the sine function above.

When I need to distinguish between the approximation of Equation sine left-parenthesis x right-parenthesis and the true Equation sine left-parenthesis x right-parenthesis , I'll be using Equation ModifyingAbove sine With caret left-parenthesis x right-parenthesis to represent the approximation.

Sin(x) as a sum of Chebyshev polynomials

The first step is to re-express Equation sine left-parenthesis x right-parenthesis over the domain of interest as an infinite polynomial. One could use a Taylor series, but convergence is very slow. Instead, I use Chebyshev polynomials.

For me, it helps to view these things using concepts from linear algebra. Any continuous and smooth function can be expressed as a polynomial: Equation upper P left-parenthesis x right-parenthesis equals a 0 plus a 1 x plus a 2 x squared plus ellipsis . This is a weighted sum of powers of Equation x , and because Equation StartSet 1 comma x comma x squared comma ellipsis EndSet are all linearly independent, we can think of this set as a basis for our function.

Instead of representing our function in terms of the standard polynomial basis ( Equation StartSet 1 comma x comma x squared comma ellipsis EndSet ), we can use Equation StartSet upper T 0 left-parenthesis x right-parenthesis comma upper T 1 left-parenthesis x right-parenthesis comma upper T 2 left-parenthesis x right-parenthesis comma ellipsis EndSet , where Equation upper T Subscript n Baseline left-parenthesis x right-parenthesis is the Chebyshev polynomial of the first kind.

The first five Chebyshev polynomials. From Wikipedia (public domain).

Chebyshev polynomials have the property that all their local extrema lie within Equation negative 1 less-than x less-than 1 , and each such extrema has a value of exactly Equation plus-or-minus 1 . For those familiar with Fourier series, this should look somewhat familiar. We're projecting the sine function onto a basis where each basis function is oscillating at an increasing rate. It's intuitive, then, that the smoother a function is, the more rapidly the coefficients of its projection onto Equation StartSet upper T 0 left-parenthesis x right-parenthesis comma upper T 1 left-parenthesis x right-parenthesis comma ellipsis EndSet will decay (e.g. Equation sine left-parenthesis x right-parenthesis doesn't oscillate rapidly over the interval Equation left-parenthesis negative 1 comma 1 right-parenthesis , so any rapidly-oscillating Equation upper T Subscript n Baseline left-parenthesis x right-parenthesis component would have to be small). Although the basis has infinite cardinality, most energy is located in the lower components. We can truncate the basis to Equation StartSet upper T 0 left-parenthesis x right-parenthesis comma ellipsis comma upper T Subscript upper N Baseline left-parenthesis x right-parenthesis EndSet for a relatively small value of N and achieve a very good approximation of Equation sine left-parenthesis x right-parenthesis over the interval Equation left-parenthesis negative 1 comma 1 right-parenthesis .

Note that for a smooth function approximated with a finite set of Chebyshev polynomials, the error is spread in a fairly uniform manner. The approximation isn't intrinsically more accurate near 0 and less accurate at the extremeties, for example. We can say that Equation StartAbsoluteValue upper E left-parenthesis x right-parenthesis EndAbsoluteValue less-than upper B for all Equation negative 1 less-than x less-than 1 , where Equation upper E left-parenthesis x right-parenthesis is the error of the approximation function ( Equation upper E left-parenthesis x right-parenthesis equals ModifyingAbove sine With caret left-parenthesis x right-parenthesis minus sine left-parenthesis x right-parenthesis ), and Equation upper B is some bound that decreases as we increase the number of terms in the approximation.

This is good. If we approximated Equation sine left-parenthesis x right-parenthesis with 6 chebyshev terms, we might well get the error bound, Equation upper B , down to Equation 10 Superscript negative 9 . However, optimizing for absolute error is generally not what we want! The nature of floating point numbers is such that precision is as much as Equation 10 Superscript negative 45 near zero, and as little as Equation 10 Superscript negative 8 near one. What we really want to minimize is the relative error. As long as Equation StartAbsoluteValue upper E left-parenthesis x right-parenthesis slash sine left-parenthesis x right-parenthesis EndAbsoluteValue less-than 2 Superscript negative 24 , we know that there is no closer float to the true value than what we've approximated*.

* The catch is that achieving Equation StartAbsoluteValue upper E left-parenthesis x right-parenthesis slash sine left-parenthesis x right-parenthesis EndAbsoluteValue less-than 2 Superscript negative 24 only ensures we're off by less than 1 ULP sans rounding. If the true answer falls in between two floats, this isn't enough to guarantee that we round to the correct one. If the answer is nearly the average of the two adjacent floats, being off by just 0.01 ULP in our model could cause incorrect rounding that actually bumps us to a full 0.5 ULP error. I imagine optimizing for 0.5 ULP error requires more novel techniques. I found a paper detailing Intel's formally-verified sine implementation for their IA-64 architecture, and even it only claims accuracy to 0.57 ULP.

Optimizing relative error

To optimize for relative error, we first scale Equation sine left-parenthesis x right-parenthesis by some easily-reversible function in order to make the result have less dynamic range. For example, if we scale Equation sine left-parenthesis x right-parenthesis by Equation left-bracket StartFraction 8 Over 3 pi EndFraction left-parenthesis x right-parenthesis minus StartFraction 8 Over 3 pi cubed EndFraction left-parenthesis x cubed right-parenthesis right-bracket Superscript negative 1 , we get something that looks like the below plot.

Scaled sine function with a dynamic range of ~2, plotted on Equation left-parenthesis negative pi comma pi right-parenthesis .

The advantage of this is that if we optimize the absolute error of the above function to Equation 2 Superscript negative 24 Baseline slash d , where Equation d is the dynamic range, then we can apply the inverse scaling function and obtain a Equation sine left-parenthesis x right-parenthesis approximation that's accurate to a relative error of Equation 2 Superscript negative 24 everywhere. I derived the scaling function by solving the min-degree odd polynomial that has a zero at Equation x equals 0 and Equation x equals pi , and a one at Equation x equals pi slash 2 .

Let's now project the scaled sine function onto the Chebyshev basis polynomials. To do this, I followed pages 7-8 of this University of Waterloo pdf. Specifically, it shows the following property for Chebyshev functions, which arises from their orthogonality.

Chebyshev inner-product equality

If the scaled sine function is representable in terms of the Chebyshev basis functions, i.e. Equation f left-parenthesis x right-parenthesis equals sigma-summation Underscript n Endscripts upper A Subscript n Baseline upper T Subscript n Baseline left-parenthesis x right-parenthesis , then the integral Equation integral Subscript negative 1 Superscript 1 Baseline StartFraction upper T Subscript n Baseline left-parenthesis x right-parenthesis f left-parenthesis x right-parenthesis Over StartRoot 1 minus x squared EndRoot EndFraction d x is exactly equal to Equation integral Subscript negative 1 Superscript 1 Baseline StartFraction upper T Subscript n Baseline left-parenthesis x right-parenthesis upper A Subscript n Baseline upper T Subscript n Baseline left-parenthesis x right-parenthesis Over StartRoot 1 minus x squared EndRoot EndFraction d x . By moving Equation upper A Subscript n out of the integral, the following relation is obtained: Equation mathematical left-angle f left-parenthesis x right-parenthesis comma upper T Subscript n Baseline left-parenthesis x right-parenthesis mathematical right-angle equals upper A Subscript n Baseline mathematical left-angle upper T Subscript n Baseline left-parenthesis x right-parenthesis comma upper T Subscript n Baseline left-parenthesis x right-parenthesis mathematical right-angle , where Equation mathematical left-angle g left-parenthesis x right-parenthesis comma h left-parenthesis x right-parenthesis mathematical right-angle represents the inner product, i.e. Equation mathematical left-angle g left-parenthesis x right-parenthesis comma h left-parenthesis x right-parenthesis mathematical right-angle equals integral Subscript negative 1 Superscript 1 Baseline StartFraction g left-parenthesis x right-parenthesis h left-parenthesis x right-parenthesis Over StartRoot 1 minus x squared EndRoot EndFraction d x . This gives a straightforward way to solve for each Equation upper A Subscript n and thereby re-express the scaled sine function in terms of the Chebyshev basis.

I compute the first 11 terms of the series with the following Mathematica code. I omit solving for Equation upper A 1 comma upper A 3 comma upper A 5 comma upper A 7 and Equation upper A 9 because symmetry requires they be zero. In order to keep the function's domain as Equation left-parenthesis negative 1 comma 1 right-parenthesis , I scale the parameter to Equation sine left-parenthesis x right-parenthesis by Equation pi . This way, we get coefficients to a function that computes one full cycle of Equation sine left-parenthesis pi x right-parenthesis . I'll undo the scaling of Equation x later.

Solve for the Chebyshev coefficients to the scaled sine function. WorkingPrecision needs to be raised for NIntegrate to converge (or use Integrate, if you're patient).

Now that we have the Chebyshev coefficients, we can reconstruct an approximation to Equation sine left-parenthesis x right-parenthesis and also undo the scaling of the Equation x parameter:

Reconstructing Equation ModifyingAbove sine With caret left-parenthesis x right-parenthesis equals upper A 0 upper T 0 left-parenthesis x right-parenthesis plus upper A 1 upper T 1 left-parenthesis x right-parenthesis plus ellipsis

A quick plot of the reconstructed sine function looks promising:

Plot of reconstructed sine approximation on Equation left-parenthesis negative pi comma pi right-parenthesis .

Likewise, the plot for the relative error measurement: Equation left-parenthesis ModifyingAbove sine With caret left-parenthesis x right-parenthesis minus sine left-parenthesis x right-parenthesis right-parenthesis slash sine left-parenthesis x right-parenthesis .

Relative error of the sine approximation on Equation left-parenthesis negative pi comma pi right-parenthesis .

The error is fairly uniform, and it reaches the target of Equation 2 Superscript negative 24 . For completeness, the polynomial coefficients are shown below. Chebyshev functions are themselves polynomials, so expanding the function gives an ordinary polynomial.

Polynomial approximation to Equation sine left-parenthesis x right-parenthesis .

Evaluating the polynomial approximation

Given these coefficients, the polynomial can be evaulated with Horner's method (aka "nested multiplication"):

fn sine(x: f32) -> f32 {
    let coeffs = [
        0.999999999973088f32, // x
        -0.1666666663960699f32, // x^3
        0.00833333287058762f32, // x^5
        -0.0001984123883227529f32, // x^7,
        2.755627491096882e-6f32, // x^9
        -2.503262029557047e-8f32, // x^11
        1.58535563425041e-10f32, // x^13
    ];
    let p13 = coeffs[6];
    let p11 = p13*x*x + coeffs[5];
    let p9  = p11*x*x + coeffs[4];
    let p7  = p9*x*x  + coeffs[3];
    let p5  = p7*x*x  + coeffs[2];
    let p3  = p5*x*x  + coeffs[1];
    let p1  = p3*x*x  + coeffs[0];
    p1*x
}

So how does this perform?

Average error (in ULP) v.s. x.

Not very well, but why?

It turns out we've reached the point where the low-error approximation is only accurate when evaluated with infinite precision. When we perform the computations using 32-bit floats, we're plagued by rounding errors. You see that first coefficient: 0.999999999973088f32?

> println!("{}", 0.999999999973088f32 == 1.0f32);
true

Yup, we chose coefficients that can't even be represented in our number system. Of course the results have error. As for what we can do about it, notice where the error occurs. It only becomes excessive as Equation x approaches Equation plus-or-minus pi , at which point Equation ModifyingAbove sine With caret left-parenthesis x right-parenthesis should approach 0. But the slight bit of error in the coefficients makes it so it doesn't approach 0 at quite the right place. Our polynomial already has a factor of Equation x explicitly factored in order to create a root at Equation x equals 0 . My instincts are to pull out a factor of Equation left-parenthesis x minus pi right-parenthesis and Equation left-parenthesis x plus pi right-parenthesis as well.

Equation sine left-parenthesis x right-parenthesis approximation divided by Equation left-parenthesis x minus pi right-parenthesis left-parenthesis x plus pi right-parenthesis .

Now if I multiplied by Equation left-parenthesis x plus pi right-parenthesis left-parenthesis x minus pi right-parenthesis in 32-bit float to undo the division, that wouldn't be quite right, because the float closest to Equation pi isn't exactly equal to the true value of Equation pi . The result is that sin(f32::consts::PI) shouldn't be 0, so we don't want a root at (x-f32::consts::PI).

Instead, I'll make a root at Equation x plus ModifyingAbove pi With caret plus normal upper Delta , where Equation ModifyingAbove pi With caret is the closest f32 to the real Equation pi , and Equation normal upper Delta is a small correction factor that's approximately the true Equation pi minus the f32 Equation ModifyingAbove pi With caret . Because the correction factor, Equation normal upper Delta , is close to 0, the net value being added/subtracted is accurate to more decimal places. The effect is only seen when x is near Equation plus-or-minus pi , otherwise the Equation normal upper Delta offset itself gets absorbed by rounding errors. But that's ok, because it's only needed to fix errors near Equation plus-or-minus pi in the first place.

let coeffs = [
    -0.101321183346709072589001712988183609230944236760490476f32, // x
     0.00662087952180793343258682906697112938547424931632185616f32, // x^3
    -0.000173505057912483501491115906801116298084629719204655552f32, // x^5
     2.52229235749396866288379170129828403876289663605034418e-6f32, // x^7
    -2.33177897192836082466066115718536782354224647348350113e-8f32, // x^9
     1.32913446369766718120324917415992976452154154051525892e-10f32, // x^11
];
let x2 = x*x;
let p11 = coeffs[5];
let p9  = p11*x2 + coeffs[4];
let p7  = p9*x2  + coeffs[3];
let p5  = p7*x2  + coeffs[2];
let p3  = p5*x2  + coeffs[1];
let p1  = p3*x2  + coeffs[0];
(x - 3.1415927410125732f32 + 0.00000008742277657347586f32) *
(x + 3.1415927410125732f32 - 0.00000008742277657347586f32) * p1 * x
Average error (in ULP) v.s. x.

At this point, it helps to look at the error bounds, rather than just their average magnitude:

Max & min error (signed) v.s. x. The global maximum error is 4.90 ULP, at x=2.6183214.

The plot above shows that the error oscillates over a large range, and rapidly. The plot was made by taking the min/max signed error over each group of 4096 adjacent floats, so we see that the signed error changes by as much as 9 ULP over the course of just 4096 adjacent floats. It doesn't seem that the error could be meaningfully addressed by adding more polynomial terms or better optimization of the existing polynomial coefficients, then - it's essentially noise.

Without resorting to piecewise functions (or another form of control logic), I think my options are fairly limited at this point. I could use a similar trick as with subtracting Equation pi , and represent each coefficient more precisely as a sum of two floats with differing magnitude (i.e. the f32 analog of double doubles), but at the cost of more complexity. Or I could brute-force search nearby sets of coefficients and see if there are any that just happen to be less susceptible to this noise.

So I ran the brute-forcer for 20 hours and it turned up a slightly better result (it actually got this result within the first hour). The resulting coefficients are the ones shown in the code at the top of this article. The maximum error over Equation left-parenthesis negative pi comma pi right-parenthesis is now 4.58 ULP and occurs at x=3.020473.

Max & min error v.s. x for the brute-forced coefficients.

Although performance wasn't my goal here, I'll conclude with a benchmark against the standard library's implementation, which probably handles infinities, NaNs, and performs reduction for numbers outside Equation left-parenthesis negative pi comma pi right-parenthesis .

extern crate rand;
extern crate test;
use rand::{Rng, XorShiftRng};
use test::Bencher;
const TWO_PI: f32 = 2f32*std::f32::consts::PI;

#[bench]
fn bench_my_sin(b: &mut Bencher) {
    let mut rng = XorShiftRng::new_unseeded();
    b.iter(|| {
        let x = (rng.next_f32()-0.5f32) * TWO_PI;
        bruteforce_sine(x)
    });
}

#[bench]
fn bench_f32_sin(b: &mut Bencher) {
    let mut rng = XorShiftRng::new_unseeded();
    b.iter(|| {
        let x = (rng.next_f32()-0.5f32) * TWO_PI;
        x.sin()
    });
}

$ cargo bench
test bench_f32_sin ... bench:          16 ns/iter (+/- 1)
test bench_my_sin  ... bench:           7 ns/iter (+/- 1)