Consider a set of $n$ values $x_i$ for $i = 1, 2, \ldots n$. The arithmetic mean $\mu_n$ of this collection of values is defined as: $$ \mu_n = \displaystyle\frac{1}{n}\sum_{i=1}^n x_i \label{post_c34d06f4f4de2375658ed41f70177d59_mean} $$ The simplicity of equation \eqref{post_c34d06f4f4de2375658ed41f70177d59_mean} hides an important issue: computing the sum of all values $x_i$ numerically is not a good idea since accuracy errors which are inherent in floating point arithmetic might degrade the accuracy of the computed mean value. This is due to the fact that as we sum the values $x_i$, the partial sum can become very large, so adding another value $x_i$ to it might amount to adding a small number to a large number. Given the finite precision involved in each of these additions, each computed partial sum may become less and less accurate; if this is the case, the accuracy of the computed mean value will suffer as well.
Notice, however, that equation \eqref{post_c34d06f4f4de2375658ed41f70177d59_mean} can be written as below: $$ \mu_n = \displaystyle\frac{1}{n}\sum_{i=1}^n x_i = \frac{1}{n}\left(x_n + \sum_{i=1}^{n-1} x_i\right) = \frac{1}{n}\left(x_n + (n-1)\mu_{n-1}\right) $$ where $\mu_{n-1}$ is the arithmetic mean of the first $(n-1)$ values $x_1, x_2, \ldots, x_{n-1}$: $$ \mu_{n-1} = \displaystyle\frac{1}{n-1}\sum_{i=1}^{n-1} x_i $$ Therefore, we have that: $$ \boxed{ \displaystyle\mu_n = \mu_{n-1} + \frac{1}{n}(x_n - \mu_{n-1}) } \label{post_c34d06f4f4de2375658ed41f70177d59_mean_stable} $$ Equation \eqref{post_c34d06f4f4de2375658ed41f70177d59_mean_stable} gives us a recursive formula for computing $\mu_n$ from the values of $\mu_{n-1}$ and $x_n$. This means we will need to compute $\mu_{n-1}$ before computing $\mu_n$. This recursive approach requires us then to compute $\mu_{n-2}$ to compute $\mu_{n-1}$, and so on. Therefore, to compute $\mu_n$, we will need to compute all of $\mu_1, \mu_2, \ldots, \mu_{n-1}$ first. This technique is a bit more expensive than directly using \eqref{post_c34d06f4f4de2375658ed41f70177d59_mean} since we have to perform more arithmetic operations to compute $\mu_n$, but the overall time complexity is still $O(n)$.
Why is the recusive formula \eqref{post_c34d06f4f4de2375658ed41f70177d59_mean_stable} better than the sum formula \eqref{post_c34d06f4f4de2375658ed41f70177d59_mean}? The reason is simple: the recursive formula avoids doing arithmetic operations with large and small numbers. The only issue there is that the factor $1/n$ can make the second term too small compared to the first one if $n$ is very large, but in practice this is much less of a problem than the accuracy issues discussed above.
To exemplify, suppose we throw a dice with six faces $n$ times and compute the mean value of the face which falls upwards (the dice needs not be fair). Assume that each face $k = 1, 2, \ldots, 6$ falls $n_k$ times upwards. For this particular example, the exact mean value of the top face can be computed directly: $$ \mu^e_n = \displaystyle\frac{1}{n}\sum_{i=1}^n x_i = \frac{1}{n}\sum_{k=1}^6 k n_k \label{post_c34d06f4f4de2375658ed41f70177d59_dice_mean} $$ where $x_i$ is the result of the $i$-th throw. Denoting the mean values computed using equations \eqref{post_c34d06f4f4de2375658ed41f70177d59_mean} and \eqref{post_c34d06f4f4de2375658ed41f70177d59_mean_stable} as $\mu_n^s$ and $\mu_n^r$ (for "sum" and "recursive") respectively, we can then see which one is more accurate by comparing their values with the exact value $\mu^e_n$. In what follows, we will use single-precision floating-point numbers to make the effects of finite precision arithmetic more visible, but the results shown below are also true for double-precision numbers even though larger values of $n$ may be necessary for the effects to become significant. Table 1 shows some simulation results obtained for different sets of values $(n_1, n_2, \ldots, n_6)$.
$n\;(\times 10^6)$ | $(n_1, n_2, n_3, n_4, n_5, n_6)\;(\times 10^6)$ | $\mu_n^s$ | $\mu_n^r$ | $\mu_n^e$ |
---|---|---|---|---|
$6$ | $(1,1,1,1,1,1)$ | $3.4856$ | $3.4997$ | $3.5000$ |
$15$ | $(4,2,1,4,1,3)$ | $3.2263$ | $3.2376$ | $3.3333$ |
$18$ | $(3,3,3,3,3,3)$ | $3.4037$ | $3.5002$ | $3.5000$ |
$20$ | $(20,0,0,0,0,0)$ | $0.8389$ | $1.0000$ | $1.0000$ |
Table 1: | Mean values $\mu_n^s$ and $\mu_n^r$ computed using equations \eqref{post_c34d06f4f4de2375658ed41f70177d59_mean} and \eqref{post_c34d06f4f4de2375658ed41f70177d59_mean_stable} respectively, and exact mean values $\mu_n^e$ computed using equation \eqref{post_c34d06f4f4de2375658ed41f70177d59_dice_mean}. All values of $n$ and $n_k$ for $k = 1, 2, \ldots, 6$ are shown divided by $10^6$. Notice how the values of $\mu_n^r$ are significantly better than those of $\mu_n^s$. |
For completeness, here is the Python (version 3) script used for computing $\mu_n^s$, $\mu_n^r$ and $\mu_n^e$:
import random import numpy # dice face values face = [1, 2, 3, 4, 5, 6] # number of times each dice face falls upwards n_face = [4000000, 2000000, 1000000, 4000000, 1000000, 3000000] # simulate n dice throws (with counts for each face given by n_face) values = [] for i in range(0, 6): values += [face[i]] * n_face[i] random.seed(0) random.shuffle(values) # mean computed using the sum formula mean = numpy.sum(values, dtype=numpy.float32) / numpy.float32(len(values)) print("mu^s: %.4f" % mean) # mean computed using the recursive formula mean = numpy.float32(0.0) n = 1 for x in values: mean += (numpy.float32(x) - mean) / numpy.float32(n) n += 1 print("mu^r: %.4f" % mean) # exact mean value (up to float32 precision) mean = numpy.float32(numpy.dot(face, n_face) / len(values)) print("mu^e: %.4f" % mean)
Comments
No comments posted yet.