The basic version of the central limit theorem says that for an i.i.d. sample from a distribution with finite variance \(\sigma^2\) and mean \(\mu\), the re-scaled sample mean converges to a standard normal distribution:

\[ \sqrt{N} (\bar{X} - \mu) \rightarrow Normal(0,\sigma^2). \]

Basically the CLT says that the error in the sample mean in estimating the location parameter decreases at a rate of \(O(N^{-½})\) with the sample size. The CLT is what makes statistical inference possible. Some parts of the above statement may be relaxed, but if you get rid of the assumption of finite variance, things can get weird.

The Cauchy distribution with location \(\mu\) and scale \(\sigma\) is a distribution on \((-\infty,\infty)\) with density

\[ f(x) = \frac{1}{\sigma\pi(1+(\frac{x-\mu}{\sigma})^2}. \]

It is well-known that the sample mean \(\bar{X}\) of an i.i.d. sample \(X_1,\ldots,X_N\) from a Cauchy distribution has the same law as a single sample, \(\bar{X}\overset{D}{=} X_1\). And so the sample mean does not improve as an estimate of the location as the number of samples increases.

A surprising fact is there are distributions where the sample mean gets worse as an estimate of location as the number of samples increases.

Let \(X\sim Normal(0,1)\), and let \(Y=\frac{1}{X^2}\). Applying a change of variables we find that \(Y\) has distribution function

\[ F(y) = 2\left(1-\Phi\left(\frac{1}{\sqrt{y}}\right)\right), \]

where \(\Phi\) is the standard normal c.d.f.

Define \(F_a(y) = F(y/a)\) to be the scale family of densities for \(F\). Consider i.i.d. \(Y_1,Y_2\sim F=F_1\). It is possible to verify that the convolution \(Y_1+Y_2 \sim F_2\) (I will leave that as an exercise).

What does this tell us? Consider the probability

\[ P\left(\frac{Y_1+Y_2}{4} \leq y\right)  = 2 \left(1 - \Phi\left(\frac{2}{(4y)^{½}}\right) \right)= F_1(y).\]

Thus, the law of the sample mean with two samples has the same law as twice any of the individual samples.

In general, we get that the sample mean has the same law as a sample scaled by the number of samples:

\[ \bar{Y} \overset{D}{=} N Y_1. \]

Instead of the error decreasing, the the error of the sample mean in estimating the location increases at a rate of \(O(N)\) with the number of samples!

Postscript: This distribution is called the Lévy distribution and the unusual behavior of the sample mean (for Cauchy as well) is described by the generalized central limit theorem. The Lévy, Cauchy and Gaussian distribution are all stable distributions, which means the families are each closed under averaging. For the Gaussian family, the law of the sample mean follows \(\bar{X}\overset{D}{=} N^{-½} X_1\).