Correct Confidence Intervals for Quantiles

In this post, I overview the optimal non-parametric approach to quantile confidence intervals. We will discuss the theoretical background as well as an efficient algorithm implemented in Python.

Consider an ordered iid sample \(x^{(1)} < x^{(2)} < \ldots < x^{(n)}\) from a distribution \(X\), and let \(q\) be the \(p\)-th quantile, that is \(\mathbf{P}\{X\leqslant q\}=p\). Using our samples we want to build a possibly short confidence interval with satisfying coverage, that is to solve the optimization problem
\begin{align}
\begin{aligned}
\mathrm{minimize} && r-\ell \\
&& \mathbf{P}\{ x^{(\ell)} \leqslant q \leqslant x^{(r)} \} \geqslant 1-\alpha
\end{aligned}
\end{align}

We start with establishing explicit formulas for the coverage probability. First,

$$
\mathbf{P}\{ x^{(k)} \leqslant q \} = \mathbf{P}\{ \mathsf{Binom}(n,p) \geqslant k\},
$$

because \(x^{(k)} \leqslant q\) means that at least \(k\) samples are smaller or equal to \(q\). Accordingly,
$$
\mathbf{P}\{ x^{(k)} < q \} = \mathbf{P}\{ \mathsf{Binom}(n,p^{-}) \geqslant k\},
$$
with \(p^{-} = \mathbf{P}\{X < q\} \). Therefore, we obtain
$$
\mathbf{P}\{ x^{(\ell)} \leqslant q \leqslant x^{(r)} \} = \mathbf{P}\{ \mathsf{Binom}(n,p) \geqslant \ell\} – \mathbf{P}\{ \mathsf{Binom}(n,p^{-}) \geqslant r\},
$$
which reduces to \(\mathbf{P}\{ x^{(\ell)} \leqslant q \leqslant x^{(r)} \} = \mathbf{P}\{ \ell\leqslant \mathsf{Binom}(n,p) \leqslant r-1\} \) if \(q\) is a discontinuity point (this has been known for strictly increasing continuous CDFs ​1​). Since
$$
1-\mathbf{P}\{ x^{(\ell)} \leqslant q \leqslant x^{(r)} \} = \mathbf{P}\{ \mathsf{Binom}(n,p) \leqslant \ell-1\} + \mathbf{P}\{ \mathsf{Binom}(n,p^{-}) \geqslant r\},
$$
the smallest interval with coverage at least \(1-\alpha\) is determined by the optimization task
\begin{align}
\begin{aligned}
\min_{\ell\leqslant r}&& r-\ell \\
&& \mathbf{P}\{ \mathsf{Binom}(n,p) \leqslant \ell-1\} + \mathbf{P}\{ \mathsf{Binom}(n,p^{-}) \geqslant r\} \geqslant \alpha
\end{aligned} && (*)
\end{align}
which can be solved under the necessary condition that
\begin{align}
(1-p)^{n} + (p^{-})^n \leqslant \alpha.
\end{align}

Introducing \(r :=n -r + 1\) we obtain the more convenient/symmetric formulation
\begin{align}
\begin{aligned}
\max_{\ell+r\leqslant n+1}&& \ell + r \\
&& \mathbf{P}\{ \mathsf{Binom}(n,p) \leqslant \ell-1\} + \mathbf{P}\{ \mathsf{Binom}(n,1-p^{-}) \leqslant r-1\} \leqslant \alpha
\end{aligned}
\end{align}
and use this formulation to develop an algorithm in Python

from scipy import stats

def quantile_ci(x,p,alpha,p_minus=None,):
  # solve related binomial covering problem
  n = len(x)
  if not p_minus:
    p_minus = p

  ks = np.arange(n+1)
  probs_l = stats.binom(n,p).pmf(ks)
  probs_r = stats.binom(n,1-p_minus).pmf(ks)

  i_l = 0
  i_r = 0

  total = probs_l[i_l] + probs_r[i_r]

  while (i_l + i_r < n) and (total <= alpha):
    p_l = probs_l[i_l+1]
    p_r = probs_r[i_r+1]
    if p_l <= p_r:
      total += p_l
      if total <= alpha:
        i_l += 1
    else:
      total += p_r
      if total < alpha:
        i_r += 1

  # convert to original indices
  i_l = i_l+1
  i_r = (n+1)-(i_r+1)

  return (i_l,i_r)

As an example, consider times to failure in hours for a set of pentode radio valves ​2​:

import numpy as np

x = [46.9,
    47.2,
    49.1,
    56.5,
    56.8,
    59.2,
    59.9,
    63.2,
    63.3,
    63.4,
    63.7,
    64.1,
    67.1,
    67.7,
    73.3,
    78.5]

p = 0.75
alpha = 0.1
i_l, i_r = quantile_ci(x,p,alpha)
np.sort(x)[[i_l-1,i_r-1]] # [63.4, 78.5]


  1. 1.
    Zieliński R, Zieliński W. Best exact nonparametric confidence intervals for quantiles. Statistics. Published online February 2005:67-71. doi:10.1080/02331880412331329854
  2. 2.
    Conover WJ. Practical Nonparametric Statistics. John Wiley & Sons; 1999.

Published by mskorski

Scientist, Consultant, Learning Enthusiast

Leave a comment

Your email address will not be published. Required fields are marked *