Pitman Closeness, a strange alternative to risk

Here is the curious story of a one-time alternative to the accepted notions of statistical optimality. Today, when we talk about decision theory, we think of the risk, the expected loss of a particular decision rule. However at one point in the history of Statistics, there was another candidate. Pitman Closeness makes a lot of sense conceptually, and generated quite a bit of interest in past decades. However, it can lead you to some strange conclusions. As such, it has not lasted the test of time.

Statistical decision theory begins by considering an observation \(x\) drawn from a distribution \(F(x\mid \theta)\) parametrized by \(\theta\), a decision rule \(\delta\) which is a measurable function of the data \(x\), and a loss function \(L(\theta,\delta(x))\), which measures the loss from taking some action \(\delta\). The risk is defined as

\[ R(\theta,\delta) = \mathbb{E}_F[ L(\theta,\delta(X))], \]

which measures the expected loss averaging over the distribution \(F\). Decision-theoretic concepts of optimality are defined with respect to the risk. For example, a decision rule \(\delta^*\) is minimax if it minimizes the maximum risk over a class of decision rules \(\mathcal{D}\): for all \(\delta\in\mathcal{D}\),

\[ \max_\theta R(\theta,\delta^*) \leq \max_\theta R(\theta,\delta). \]

In general there could be multiple minimax decision rules.

Another desirable property of a decision rule \(\delta^*\) is admissability, which says that there is no decision rule \(\delta\) which dominates \(\delta_1\). A decision rule \(\delta\) dominates \(\delta_1\) if

\[ R(\theta,\delta) \leq R(\theta,\delta^*) \]

for all \(\theta\), with strict inequality for some \(\theta\). Admissability is a desirable but not sufficient measure of optimality. For example a constant estimator is usually admissable (it has the minimal possible risk when the parameter takes its value). An admissable rule need not be minimax, nor a minimax rule admissable. Thus if a decision rule is both admissable and minimax it should be put in high regard.

Pitman Closeness

An interesting alternative to comparing estimators according to risk was proposed in Pitman, 1937. A decision rule \(\delta_1\) Pitman dominates \(\delta_2\), denoted \(\delta_1 \overset{P}{\succ}\delta_2\) if for all \(\theta\),

\[ P(L(\theta,\delta_1) \leq L(\theta,\delta_2)) > ½. \]

Pitman domination simply says that it is more probable than not that one decision rule has smaller loss than the other. This criterion appears to have some advantages over risk. Firstly, it considers the entire distribution of the loss \(L(\theta,\delta)\), while the risk is just the expectation over the loss. Also crucially, it involves the joint distribution of the pair \(\{L(\theta,\delta_1),L(\theta,\delta_2)\}\). At first glance, this looks like a good way to compare decision rules.

A decision rule which Pitman dominates the normal sample mean

Consider an i.i.d. sample \(X_1,\ldots,X_N\) from a univariate normal distribution, \(Normal(\theta,1)\). It is well known that the sample mean \(\bar{X}\) is unbiased, UMVUE, minimax and admissable (in terms of the squared error loss). The univariate normal means problem is pretty much the most cut-and-dry problem in statistics, and any good decision theory framework should tell you to use the sample mean. Weirdly enough, there is an estimator which Pitman dominates the sample mean!. This example comes from Efron, 1975.

Define \(X^*\) by

\[ X^* = \bar{X} - \Delta(\bar{X}), \]

where \(\Delta\) is an odd function, which for \(x\geq 0\) takes the values,

\[ \Delta(x) = \frac{1}{2\sqrt{N}} \min \left\{\sqrt{N}x, \Phi(-\sqrt{N}x)\right\}. \]

Then \(X^* \overset{P}{\succ} \bar{X}\).

What’s going on? Observe that \(X^*\) is a function of \(\bar{X}\) which shrinks it towards zero. This sounds unintuitive, since the rule dominates the sample mean for any \(\theta\), even when it is very far from 0! But there is actually a similar phenomenon in decision theory, which is known as the Stein effect, in which you may construct an estimator which dominates a minimax estimator by shrinking it towards zero. In a monumental work, Charles Stein proved that this Stein effect occurs when estimating a normal mean vector of length at least 3, when your loss function is the squared-error loss. The Stein estimator shook the Statistics field when it was discovered, and spawned a pet industry of theoretical work relating to shrinkage estimators (and also set the foundations for the use of regularization today, which is less appreciated by most practitioners). But no estimator dominates the sample mean in the univariate scenario. The peculiar thing about Pitman domination is you get a Stein-type effect even for estimation of a single normal mean. Even in the simplest possible estimation problem, we end up with a paradoxical result.

Pitman Closeness is not transitive

The next strange property of Pitman closeness is that it is not transitive. In general, it can’t formulate an ordering over a set of decision rules, and as a consequence there may be no Pitman dominant decision rule. This example comes from Robert, 2007.

Let \(x \sim Uniform(-0.9\theta,1.1\theta)\). Then consider the decision rules \(\delta_0(x) = x,\delta_1(x)=0.9\mid x \mid\), and \(\delta_2(x) = 3.2 \mid x \mid \). Then \(\delta_0 \overset{P}{\succ}\delta_1,\delta_1 \overset{P}{\succ}\delta_2,\) and \(\delta_2 \overset{P}{\succ}\delta_0\).

Conclusion

Pitman closeness is an interesting idea, and was the focus of a fair amount of research at one time. There was a book devoted to it. But today it is a little-known curiosity. Pitman closeness begins with eminently reasonable foundations but then leads to too many conclusions that go against what we might want from a reasonable decision theory. The risk approach to decision theory, despite some it’s own paradoxes, is the accepted benchmark in statisics.