Inspired by
Ian Stephenson, I found a faster and more accurate approximation of $\log_2$ which in turns improves my estimates of inverse $p$ roots. The resulting tricks work well generally for computing powers $x^p$ which is also part of computing $L_p$ norm. There's also a fast approximate logarithm and exponential, which are more broadly relevant since these operations are used extensively in practice to
represent probabilities.
Update: The code samples here are out of date; minor problems have been fixed and you should get the latest version from the
Google code repository. (
Update2: Google code is dead, but
versions exist on github.)
A Fast Approximate Logarithm
A fast $\log_2$ approximation implies a fast approximation for natural log or any other base known at compile time since the difference is one multiply. Therefore, consider a floating-point number $x$ and the value $I_x$ associated with interpreting the byte representation of $x$ as an integer, \[
\begin{aligned}
x &= (1 + m_x) 2^{e_x}, \\
\log_2 (x) &= e_x + \log_2 (1 + m_x) ,\\
I_x &= (e_x + B) L + m_x L, \\
I_x / L - B &= e_x + m_x, \\
\log_2 (x) &= I_x / L - B + \log_2 (1 + m_x) - m_x.
\end{aligned}
\] Thus what is needed is a way to extract the mantissa from $x$ and approximate $g (z) = \log_2 (1 + z) - z$ for $z \in [0, 1)$. I went with a rational approximation that seemed a sweet spot of accuracy and speed, leading to
// WARNING: this code has been updated. Do not use this version.
// Instead, see http://code.google.com/p/fastapprox for the latest version.
inline float
fastlog2 (float x)
{
union { float f; uint32_t i; } vx = { x };
union { uint32_t i; float f; } mx = { (vx.i & 0x007FFFFF) | (0x7e << 23) };
float y = vx.i;
y *= 1.0 / (1 << 23);
return
y - 124.22544637f - 1.498030302f * mx.f - 1.72587999f / (0.3520887068f + mx.f);
}
inline float
fastlog (float x)
{
return 0.69314718f * fastlog2 (x);
}
If you want even faster and rougher, the average value of $g (z)$ is $\frac{-2 + \log (8)}{\log (4)}$ leading to
// WARNING: this code has been updated. Do not use this version.
// Instead, see http://code.google.com/p/fastapprox for the latest version.
inline float
fasterlog2 (float x)
{
union { float f; uint32_t i; } vx = { x };
float y = vx.i;
y *= 1.0 / (1 << 23);
return y - 126.94269504f;
}
inline float
fasterlog (float x)
{
return 0.69314718f * fasterlog2 (x);
}
Timing and Accuracy
Timing tests are done by compiling with
-O3 -finline-functions -ffast-math, on a box running 64 bit Ubuntu Lucid (so
gcc 4:4.4.3-1ubuntu1 and
libc 2.11.1-0ubuntu7.6). I also measured average relative accuracy for $x \in [1/100, 10]$.
fastlog2 relative accuracy = 2.09352e-05
fastlog relative accuracy = 2.09348e-05
fasterlog2 relative accuracy = 0.0130367
fasterlog relative accuracy = 0.0130367
fastlog2 million calls per second = 160.141
fastlog million calls per second = 143.552
fasterlog2 million calls per second = 218.345
fasterlog million calls per second = 210.435
log2f million calls per second = 40.8511
A Fast Approximate Exponential
Again 2 is the most convenient base. Here $2^x$ is what I want to approximate and $I_{2^x}$ is the value associated with interpreting the byte pattern of $2^x$ as an integer. \[
\begin{aligned}
2^x &= 2^{x - \lfloor x \rfloor} 2^{\lfloor x \rfloor} \\
&= (1 + 2^{x - \lfloor x \rfloor} - 1) 2^{\lfloor x \rfloor}, \\
I_{2^x} &= (\lfloor x \rfloor + B) L + (2^{x - \lfloor x \rfloor} - 1) L \\
&= L (x + B - 1 + 2^{x - \lfloor x \rfloor} - x + \lfloor x \rfloor).
\end{aligned}
\] The correction term is of the form $g (z) = 2^z - z$ for $z \in [0, 1)$, and once again I got the best bang for the buck from a rational function approximation. That leads to
// WARNING: this code has been updated. Do not use this version.
// Instead, see http://code.google.com/p/fastapprox for the latest version.
inline float
fastpow2 (float p)
{
union { float f; uint32_t i; } vp = { p };
int sign = (vp.i >> 31);
int w = p;
float z = p - w + sign;
union { uint32_t i; float f; } v = { (1 << 23) * (p + 121.2740838f + 27.7280233f / (4.84252568f - z) - 1.49012907f * z) };
return v.f;
}
inline float
fastexp (float p)
{
return fastpow2 (1.442695040f * p);
}
If you want even faster and rougher, the average value of $g (z)$ is $\frac{2 - \log (2)}{2 \log (2)}$ leading to
// WARNING: this code has been updated. Do not use this version.
// Instead, see http://code.google.com/p/fastapprox for the latest version.
inline float
fasterpow2 (float p)
{
union { uint32_t i; float f; } v = { (1 << 23) * (p + 126.94269504f) };
return v.f;
}
inline float
fasterexp (float p)
{
return fasterpow2 (1.442695040f * p);
}
Timing and Accuracy
Timing tests are done by compiling with
-O3 -finline-functions -ffast-math, on a box running 64 bit Ubuntu Lucid (so
gcc 4:4.4.3-1ubuntu1 and
libc 2.11.1-0ubuntu7.6). I also measured average relative accuracy for $p \in [1/20, 20]$. When I measured relative accuracy I tried each $p$ and also each $p$-th inverse root (i.e., $-1/p$).
fastpow2 relative accuracy (positive p) = 1.58868e-05
fastexp relative accuracy (positive p) = 1.60712e-05
fasterpow2 relative accuracy (positive p) = 0.0152579
fasterexp relative accuracy (positive p) = 0.0152574
fastpow2 relative accuracy (inverse root p) = 1.43517e-05
fastexp relative accuracy (inverse root p) = 1.7255e-05
fasterpow2 relative accuracy (inverse root p) = 0.013501
fasterexp relative accuracy (inverse root p) = 0.0111832
fastpow2 million calls per second = 153.561
fastexp million calls per second = 143.311
fasterpow2 million calls per second = 215.006
fasterexp million calls per second = 214.44
expf million calls per second = 4.16527
A Fast Approximate Power
The combination of a fast approximate logarithm and fast approximate exponential yields a fast approximate power.
// WARNING: this code has been updated. Do not use this version.
// Instead, see http://code.google.com/p/fastapprox for the latest version.
inline float
fastpow (float x,
float p)
{
return fastpow2 (p * fastlog2 (x));
}
Timing and Accuracy
Timing tests are done by compiling with
-O3 -finline-functions -ffast-math, on a box running 64 bit Ubuntu Lucid (so
gcc 4:4.4.3-1ubuntu1 and
libc 2.11.1-0ubuntu7.6). I also measured average relative accuracy for $(x, p) \in [1/200, 5] \times [1/40, 10]$. When I measured relative accuracy I tried each $p$ and also each $p$-th inverse root (i.e., $-1/p$).
fastpow relative accuracy (positive p) = 0.000165618
fastpow relative accuracy (inverse root p) = 0.00011997
fastpow million calls per second = 58.8533
powf million calls per second = 8.44577
A Fast Approximate Inverse Root
By inverse root I mean solving $y = x^{-1/p}$ for $p \geq 1$. Isn't this the same problem as fast approximate power? Yes, but in this regime the following approach is fairly accurate and somewhat faster. The initial approximation is based upon the modified
inverse square root hack, \[
I_y = -\frac{1}{p} I_x + \frac{p + 1}{p} (B - \sigma) L.
\] This is refined using approximate
Halley's method on $f (y) = \log (y) + \frac{1}{p} \log (x)$, \[
\begin{aligned}
y_{n+1} &= y_n \frac{1 - \delta}{1 + \delta} \approx y_n (1 - \delta)^2 \\
\delta &= \frac{\log (2)}{2} \left(\log_2 (y) + \frac{1}{p} \log_2 (x)\right)
\end{aligned}
\] The $\log_2$ is computed using the fast approximate logarithm from above. Altogether this becomes
// WARNING: this code has been updated. Do not use this version.
// Instead, see http://code.google.com/p/fastapprox for the latest version.
inline float
fasterinvproot (float x,
float p)
{
union { float f; uint32_t i; } v = { x };
unsigned int R = 0x5F375A84U; // R = (3/2) (B - sigma) L
unsigned int pR = ((2 * (p + 1)) / (3 * p)) * R;
unsigned int sub = v.i / p;
v.i = pR - sub;
return v.f;
}
inline float
fastinvproot (float x,
float p)
{
float y = fasterinvproot (x, p);
float log2x = fastlog2 (x);
float log2y = fastlog2 (y);
float err = 1.0 - 0.34657359f * (log2x / p + log2y);
return y * err * err;
}
Note:
fastinvproot is an improved (faster and more accurate) version of
fast_inverse_p_one presented in my
previous blog post.
Timing and Accuracy
Timing tests are done by compiling with
-O3 -finline-functions -ffast-math, on a box running 64 bit Ubuntu Lucid (so
gcc 4:4.4.3-1ubuntu1 and
libc 2.11.1-0ubuntu7.6). I also measured average relative accuracy for $(x, p) \in [1/200, 5] \times [1/40, 10]$. When I measured relative accuracy I tried each $p$ and also each $p$-th inverse root (i.e., $-1/p$).
fastinvproot relative accuracy (positive p) = 0.000727901
fastinvproot relative accuracy (inverse root p) = 0.00300208
fastinvproot million calls per second = 82.7019
So it's a bit confusing, but the ``positive p'' case here implies taking an inverse $p$-th root, and ``inverse root p'' actually means trying to cajole this routine into computing $y = x^p$. Therefore these numbers indicate (for reasons I don't fully understand)
fastinvproot is more accurate for finding inverse roots than as a general power routine.
Caveat Emptor
In addition to being less accurate, these routines don't handle the floating point
special values like the standard routines do. In addition if you feed these routines negative inputs inappropriately, weird stuff happens.