The sample mean is a sub-optimal estimate of the population mean

It may be surprising that the sample mean is not usually the best estimate of the population mean. It’s well known that when the data is Gaussian, the sample mean is the UMVUE, or the uniformly minimum variance unbiased estimator. The UMVUE has the smallest variance among all unbiased estimators for any choice of the unknown parameter \( \theta \). The sample mean is always unbiased (when the population mean exists) but isn’t generally minimum variance, and in some cases can have a much higher variance than optimal. This general result was first discovered by Kagan, Linnik and Rao (1965). This derivation comes from Székely et al., 1986.

Let \(F\) be a mean-zero, finite variance distribution function, and let \(F_{\theta}(x) = F(x-\theta)\). \(\theta\) is known as the location parameter, and in this setup it is the same as the mean of the distribution. We denote the density \( f(x) = F(x)’\).

An Example

Suppose that the base distribution is \(\text{Uniform(-1,1)}\), so that \( f(x) = \frac{1}{2} 1 \left\{x\in\{-1,1\}\right\}\). We have \(N\) independent and identically distributed samples from the distribution \(X_1,\ldots,X_N\). Consider the estimator

\( X^* = \frac{1}{2}(X_{(1)}+X_{(N)}),\)

that is, the average of the smallest and largest point in the sample. This is called the midrange. \(X^*\) may be shown to have variance

\( \text{var}(X^*) = \frac{2}{(N+1)(N+2)}, \)

while the sample mean \( \bar{X} \) has variance

\( \text{var}(\bar{X}) = \frac{1}{3N}, \)

which is substantially larger. \(X^*\) has variance of order \( O(N^{-2}) \), which is a whole order of magnitude smaller than that of the sample mean.

The main result

We will now construct a statistic which is asymptotically UMVUE: it is asymptotally unbiased and minimum variance among all estimators of the population mean for data arising from a particular location family.

Consider the estimator

\( X^* = \sum_{i=1}^N a_{i,N} X_{(i)}, \)

where \(X_{(i)}\) is the i-th order statistic of the sample, and \(a_{i,N}\) are weights which depend on the distribution.

Suppose the distribution function \(F\) has three derivatives, let \(f(x) = F(x)’\), and define the functional

\[ a(F(x)) = - \left(\left( A + Bx \right)\left(\log f(x)\right)’ \right)’, \]

where

\(\begin{align} A &= \frac{\mu_2}{\mu_0\mu_2 - \mu_1^2}, & B = \frac{\mu_1}{\mu_0\mu_2 - \mu_1^2}, \end{align}\)

and

\( \begin{align} \mu_0 &= \intop \frac{f’(x)^2}{f(x)} dx, & \mu_1 = \intop x\frac{f’(x)^2}{f(x)} dx,\end{align}\)

\( \mu_2 = \intop x^2\frac{f’(x)^2}{f(x)} dx - 1, \)

then with the choice \(a_{i,N} = a(i/N)/N \), \(X^*\) is asymptotically UMVUE.