In this post, I overview **the optimal non-parametric approach to quantile confidence intervals**. We will discuss the theoretical background as well as an efficient algorithm implemented in Python.

Consider an ordered iid sample \(x^{(1)} < x^{(2)} < \ldots < x^{(n)}\) from a distribution \(X\), and let \(q\) be the \(p\)-th quantile, that is \(\mathbf{P}\{X\leqslant q\}=p\). Using our samples we want to build a possibly short confidence interval with satisfying coverage, that is to solve the optimization problem

\begin{align}

\begin{aligned}

\mathrm{minimize} && r-\ell \\

&& \mathbf{P}\{ x^{(\ell)} \leqslant q \leqslant x^{(r)} \} \geqslant 1-\alpha

\end{aligned}

\end{align}

We start with establishing explicit formulas for the coverage probability. First,

$$

\mathbf{P}\{ x^{(k)} \leqslant q \} = \mathbf{P}\{ \mathsf{Binom}(n,p) \geqslant k\},

$$

because \(x^{(k)} \leqslant q\) means that at least \(k\) samples are smaller or equal to \(q\). Accordingly,

$$

\mathbf{P}\{ x^{(k)} < q \} = \mathbf{P}\{ \mathsf{Binom}(n,p^{-}) \geqslant k\},

$$

with \(p^{-} = \mathbf{P}\{X < q\} \). Therefore, we obtain

$$

\mathbf{P}\{ x^{(\ell)} \leqslant q \leqslant x^{(r)} \} = \mathbf{P}\{ \mathsf{Binom}(n,p) \geqslant \ell\} – \mathbf{P}\{ \mathsf{Binom}(n,p^{-}) \geqslant r\},

$$

which reduces to \(\mathbf{P}\{ x^{(\ell)} \leqslant q \leqslant x^{(r)} \} = \mathbf{P}\{ \ell\leqslant \mathsf{Binom}(n,p) \leqslant r-1\} \) if \(q\) is a discontinuity point (this has been known for strictly increasing continuous CDFs ^{1}). Since

$$

1-\mathbf{P}\{ x^{(\ell)} \leqslant q \leqslant x^{(r)} \} = \mathbf{P}\{ \mathsf{Binom}(n,p) \leqslant \ell-1\} + \mathbf{P}\{ \mathsf{Binom}(n,p^{-}) \geqslant r\},

$$

the smallest interval with coverage at least \(1-\alpha\) is determined by the optimization task

\begin{align}

\begin{aligned}

\min_{\ell\leqslant r}&& r-\ell \\

&& \mathbf{P}\{ \mathsf{Binom}(n,p) \leqslant \ell-1\} + \mathbf{P}\{ \mathsf{Binom}(n,p^{-}) \geqslant r\} \geqslant \alpha

\end{aligned} && (*)

\end{align}

which can be solved under the necessary condition that

\begin{align}

(1-p)^{n} + (p^{-})^n \leqslant \alpha.

\end{align}

Introducing \(r :=n -r + 1\) we obtain the more convenient/symmetric formulation

\begin{align}

\begin{aligned}

\max_{\ell+r\leqslant n+1}&& \ell + r \\

&& \mathbf{P}\{ \mathsf{Binom}(n,p) \leqslant \ell-1\} + \mathbf{P}\{ \mathsf{Binom}(n,1-p^{-}) \leqslant r-1\} \leqslant \alpha

\end{aligned}

\end{align}

and use this formulation to develop an algorithm in Python

```
from scipy import stats
def quantile_ci(x,p,alpha,p_minus=None,):
# solve related binomial covering problem
n = len(x)
if not p_minus:
p_minus = p
ks = np.arange(n+1)
probs_l = stats.binom(n,p).pmf(ks)
probs_r = stats.binom(n,1-p_minus).pmf(ks)
i_l = 0
i_r = 0
total = probs_l[i_l] + probs_r[i_r]
while (i_l + i_r < n) and (total <= alpha):
p_l = probs_l[i_l+1]
p_r = probs_r[i_r+1]
if p_l <= p_r:
total += p_l
if total <= alpha:
i_l += 1
else:
total += p_r
if total < alpha:
i_r += 1
# convert to original indices
i_l = i_l+1
i_r = (n+1)-(i_r+1)
return (i_l,i_r)
```

As an example, consider times to failure in hours for a set of pentode radio valves ^{2}:

```
import numpy as np
x = [46.9,
47.2,
49.1,
56.5,
56.8,
59.2,
59.9,
63.2,
63.3,
63.4,
63.7,
64.1,
67.1,
67.7,
73.3,
78.5]
p = 0.75
alpha = 0.1
i_l, i_r = quantile_ci(x,p,alpha)
np.sort(x)[[i_l-1,i_r-1]] # [63.4, 78.5]
```

- 1.Zieliński R, Zieliński W. Best exact nonparametric confidence intervals for quantiles.
*Statistics*. Published online February 2005:67-71. doi:10.1080/02331880412331329854 - 2.Conover WJ.
*Practical Nonparametric Statistics*. John Wiley & Sons; 1999.