For many problems in mathematics and theoretical computer science, it is of interest to estimate concentration of sums of heterogenous boolean random variables. This note shows some new results obtained through majorization theory with computer-aided proof in SymPy.
Results
The Poisson Binomial distribution \(Y\sim \mathrm{PB}(p) \) is defined for \(p=(p_1,\ldots, p_n)\) as $$ Y = \sum_{i=1}^n X_i,\quad X_i \sim \mathrm{Bern}(p_i). $$
We are interested in the concentration properties of \(Y\) and its dependency on \( p=(p_1,\ldots,p_n)\). To (partly) answer this question, show that for any convex and non-decreasing function \(f\), the \(f\)-expectation of the PB distribution is Schur-concave with respect to the parameters: $$ \mathbf{E} f(\mathrm{PB}(p) ) \leq \mathbf{E} f(\mathrm{PB}(p’) ),\quad p’ \prec p, $$ where \(X_i\) are independent and \(p \prec p’\) denotes vector majorization. In particular, this means that $$ \mathbf{E} f(\mathrm{PB}(p) ) \leq \mathbf{E}f(\mathrm{Binom}(\bar{p}),\quad \bar{p} = \frac{1}{n}\sum_{i=1}^{n} p_i, $$ which means that the Binomial case is the “most dispersed” one. As for related work, see the recent survey , in particular the Schur-concavity of the tails. The result above is more general, as by focusing on functional expectations it yields various concentration bounds and is applicable to setups with limited dependence, such as \(k\)-wise independence. Indeed, by using \(f(x) = \max\{x,0\}^d\) and assuming that every \(d\) of \(X_i\) are independent, we can improve the previous concentration bounds . Numerically optimal results may be computed with recent formulas for binomial moments .
Proof Sketch
To prove the claim, we will check the Schur-Ostrowski criterion, namely that $$ \Delta_{i,j}\triangleq \left(p_i – p_j\right) \left(\frac{\partial \mathbf{E} f(\mathrm{PB}(p) ) }{ \partial p_i} – \frac{\partial \mathbf{E} f(\mathrm{PB}(p) ) }{ \partial p_j}\right) \leq 0. $$ Due to the independence of the summands $$\mathbf{E}f(Y) = \sum_{y}\mathbf{P}\{Y-X_i-X_j= y\} \mathbf{E}_{X_i,X_j}[f(X_i + X_j + y),$$ it suffices to test for any pair of variables, while treating the sum of the remaining variables \(Y-X_i-X_j\) as fixed. Since the convexity and monotonicity of \(f\) are invariant to translations, we can just work with \(\mathbf{E} f(X_i+ X_j)\). Without loss of generality, we can test this for \(i=1, j=2\), as the variables are identically distributed. Now, let’s implement this in Python!
We obtained $$ \Delta_{i,j}=\left(p_{1} p_{2} – p_{1} – p_{2}^{2}\right) f{\left(2 \right)} + \left(- 2 p_{1} p_{2} + 3 p_{1} + 2 p_{2}^{2} – p_{2} – 1\right) f{\left(1 \right)} + \left(p_{1} p_{2} – 2 p_{1} – p_{2}^{2} + p_{2} + 1\right) f{\left(0 \right)}. $$ To simplify this expression and made use of the remaining assumptions, let us introduce \(\delta_{convex} = \frac{f(2)+f(0)}{2}-f(1)\) and \(\delta_{monotone} = f(1)-f(0)\), both non-negative. Further calculations in SymPy
Skorski, M. (2024) ‘Handy formulas for binomial moments’, Modern Stochastics: Theory and Applications, pp. 1–15. Available at: https://doi.org/10.15559/24-VMSTA260.
Tang, W. and Tang, F. (2023) ‘The Poisson Binomial Distribution— Old & New’, Statistical Science, 38(1). Available at: https://doi.org/10.1214/22-STS852.