In this post, I overview the optimal non-parametric approach to quantile confidence intervals. We will discuss the theoretical background as well as an efficient algorithm implemented in Python.
Consider an ordered iid sample \(x^{(1)} < x^{(2)} < \ldots < x^{(n)}\) from a distribution \(X\), and let \(q\) be the \(p\)-th quantile, that is \(\mathbf{P}\{X\leqslant q\}=p\). Using our samples we want to build a possibly short confidence interval with satisfying coverage, that is to solve the optimization problem
\begin{align}
\begin{aligned}
\mathrm{minimize} && r-\ell \\
&& \mathbf{P}\{ x^{(\ell)} \leqslant q \leqslant x^{(r)} \} \geqslant 1-\alpha
\end{aligned}
\end{align}
We start with establishing explicit formulas for the coverage probability. First,
$$
\mathbf{P}\{ x^{(k)} \leqslant q \} = \mathbf{P}\{ \mathsf{Binom}(n,p) \geqslant k\},
$$
because \(x^{(k)} \leqslant q\) means that at least \(k\) samples are smaller or equal to \(q\). Accordingly,
$$
\mathbf{P}\{ x^{(k)} < q \} = \mathbf{P}\{ \mathsf{Binom}(n,p^{-}) \geqslant k\},
$$
with \(p^{-} = \mathbf{P}\{X < q\} \). Therefore, we obtain
$$
\mathbf{P}\{ x^{(\ell)} \leqslant q \leqslant x^{(r)} \} = \mathbf{P}\{ \mathsf{Binom}(n,p) \geqslant \ell\} – \mathbf{P}\{ \mathsf{Binom}(n,p^{-}) \geqslant r\},
$$
which reduces to \(\mathbf{P}\{ x^{(\ell)} \leqslant q \leqslant x^{(r)} \} = \mathbf{P}\{ \ell\leqslant \mathsf{Binom}(n,p) \leqslant r-1\} \) if \(q\) is a discontinuity point (this has been known for strictly increasing continuous CDFs 1). Since
$$
1-\mathbf{P}\{ x^{(\ell)} \leqslant q \leqslant x^{(r)} \} = \mathbf{P}\{ \mathsf{Binom}(n,p) \leqslant \ell-1\} + \mathbf{P}\{ \mathsf{Binom}(n,p^{-}) \geqslant r\},
$$
the smallest interval with coverage at least \(1-\alpha\) is determined by the optimization task
\begin{align}
\begin{aligned}
\min_{\ell\leqslant r}&& r-\ell \\
&& \mathbf{P}\{ \mathsf{Binom}(n,p) \leqslant \ell-1\} + \mathbf{P}\{ \mathsf{Binom}(n,p^{-}) \geqslant r\} \geqslant \alpha
\end{aligned} && (*)
\end{align}
which can be solved under the necessary condition that
\begin{align}
(1-p)^{n} + (p^{-})^n \leqslant \alpha.
\end{align}
Introducing \(r :=n -r + 1\) we obtain the more convenient/symmetric formulation
\begin{align}
\begin{aligned}
\max_{\ell+r\leqslant n+1}&& \ell + r \\
&& \mathbf{P}\{ \mathsf{Binom}(n,p) \leqslant \ell-1\} + \mathbf{P}\{ \mathsf{Binom}(n,1-p^{-}) \leqslant r-1\} \leqslant \alpha
\end{aligned}
\end{align}
and use this formulation to develop an algorithm in Python
from scipy import stats
def quantile_ci(x,p,alpha,p_minus=None,):
# solve related binomial covering problem
n = len(x)
if not p_minus:
p_minus = p
ks = np.arange(n+1)
probs_l = stats.binom(n,p).pmf(ks)
probs_r = stats.binom(n,1-p_minus).pmf(ks)
i_l = 0
i_r = 0
total = probs_l[i_l] + probs_r[i_r]
while (i_l + i_r < n) and (total <= alpha):
p_l = probs_l[i_l+1]
p_r = probs_r[i_r+1]
if p_l <= p_r:
total += p_l
if total <= alpha:
i_l += 1
else:
total += p_r
if total < alpha:
i_r += 1
# convert to original indices
i_l = i_l+1
i_r = (n+1)-(i_r+1)
return (i_l,i_r)As an example, consider times to failure in hours for a set of pentode radio valves 2:
import numpy as np
x = [46.9,
47.2,
49.1,
56.5,
56.8,
59.2,
59.9,
63.2,
63.3,
63.4,
63.7,
64.1,
67.1,
67.7,
73.3,
78.5]
p = 0.75
alpha = 0.1
i_l, i_r = quantile_ci(x,p,alpha)
np.sort(x)[[i_l-1,i_r-1]] # [63.4, 78.5]- 1.Zieliński R, Zieliński W. Best exact nonparametric confidence intervals for quantiles. Statistics. Published online February 2005:67-71. doi:10.1080/02331880412331329854
- 2.Conover WJ. Practical Nonparametric Statistics. John Wiley & Sons; 1999.
