• An anti-central limit theorem

    The basic version of the central limit theorem says that for an i.i.d. sample from a distribution with finite variance \(\sigma^2\) and mean \(\mu\), the re-scaled sample mean converges to a standard normal distribution:

    \[ \sqrt{N} (\bar{X} - \mu) \rightarrow Normal(0,\sigma^2). \]

    Basically the CLT says that the error in the sample mean in estimating the location parameter decreases at a rate of \(O(N^{-½})\) with the sample size. The CLT is what makes statistical inference possible. Some parts of the above statement may be relaxed, but if you get rid of the assumption of finite variance, things can get weird.

    The Cauchy distribution with location \(\mu\) and scale \(\sigma\) is a distribution on \((-\infty,\infty)\) with density

    \[ f(x) = \frac{1}{\sigma\pi(1+(\frac{x-\mu}{\sigma})^2}. \]

    It is well-known that the sample mean \(\bar{X}\) of an i.i.d. sample \(X_1,\ldots,X_N\) from a Cauchy distribution has the same law as a single sample, \(\bar{X}\overset{D}{=} X_1\). And so the sample mean does not improve as an estimate of the location as the number of samples increases.

    A surprising fact is there are distributions where the sample mean gets worse as an estimate of location as the number of samples increases.

    Let \(X\sim Normal(0,1)\), and let \(Y=\frac{1}{X^2}\). Applying a change of variables we find that \(Y\) has distribution function

    Read on →

  • A random variable is not uniquely determined by its moments

    The moments of a random variable \(X\) are given by \(\mathbb{E}[X^n]\), for all integers \(n\geq 1\). One fascinating fact I first learned while studying distribution theory is that the moment sequence does not always uniquely determine a distribution.

    Consider \(X\sim logNormal(0,1)\), that is, \(log(X)\) follows a standard normal distribution, and let the density of \(X\) be \(f(x)\). The moments of \(X\) exist and have the closed form

    \[ m_i := \mathbb{E}[X^i] = e^{i^2/2}. \]

    Consider the density

    \[ f_a(x) = f(x)(1+\sin(2\pi\log(x))), \,\,\, x\geq 0. \]

    Then \(f,f_a\) have the same moments. To prove this, it’s sufficient to show that for each \(i=1,2,\ldots\),

    \[ \intop_{0}^\infty x^i f(x) \sin(2\pi\log(x)) dx =0 .\]

    Read on →

  • The sample mean is a sub-optimal estimate of the population mean

    It may be surprising that the sample mean is not usually the best estimate of the population mean. It’s well known that when the data is Gaussian, the sample mean is the UMVUE, or the uniformly minimum variance unbiased estimator. The UMVUE has the smallest variance among all unbiased estimators for any choice of the unknown parameter \( \theta \).  The sample mean is always unbiased (when the population mean exists) but isn’t generally minimum variance, and in some cases can have a much higher variance than optimal. This general result was first discovered by Kagan, Linnik and Rao (1965). This derivation comes from Székely et al., 1986.

    Let \(F\) be a mean-zero, finite variance distribution function, and let \(F_{\theta}(x) = F(x-\theta)\). \(\theta\) is known as the location parameter, and in this setup it is the same as the mean of the distribution. We denote the density \( f(x) = F(x)’\).

    An Example

    Suppose that the base distribution is \(\text{Uniform(-1,1)}\), so that \( f(x) = \frac{1}{2} 1 \left\{x\in\{-1,1\}\right\}\). We have \(N\) independent and identically distributed samples from the distribution \(X_1,\ldots,X_N\). Consider the estimator

    \( X^* = \frac{1}{2}(X_{(1)}+X_{(N)}),\)

    that is, the average of the smallest and largest point in the sample. This is called the midrange. \(X^*\) may be shown to have variance

    Read on →