The zero-truncated Poisson distribution is useful to model certain demographic data, such as sizes of households and families 1. This post illustrates this on historical data from the United Kingdom and demonstrates how to assess the goodness-of-fit in Python.
The zero-truncated Poisson distribution \(Y\) is defined as
\begin{align}
\mathbf{P}\{Y=x\} = \mathbf{P}\{ X = x | X > 0\}, \quad X\sim \mathrm{Poiss}(\lambda) .
\end{align}
Conditioning which removes zero counts makes it a perfect fit for counting data that starts from 1. For example, consider the following historical UK data from a demographic research study 2
The source data can be read into Python and visualized as follows
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
N = 98
counts = np.array([5.7,14.2,16.5,15.8,14.7,11.8,8.0,5.4,3.1,1.9,1.1,0.7,1.1]) / 100 * N
bins = pd.IntervalIndex.from_breaks(list(np.arange(1,13+1)) + [np.inf],closed="left",name="household size")
counts = pd.Series(data=counts,name="count",index=bins)
fig,ax = plt.subplots(figsize=(12,4))
sns.barplot(counts.reset_index(),x="household size",y="count",ax=ax)
ax.tick_params(axis='x', labelrotation=45)
Now, we hypothetize the truncated Python distribution with \(\lambda = 5\), and test the fit quality by adjusting the chi-square method. This chi-squared technique boils down to binning a reference distribution into finitely many categories, consistently with the empirical data, and is well-explained in R packages3. We obtain a high p-value which confirms a good fit:
from scipy import stats
pmf = pd.Series({i:stats.poisson(5).pmf(i) for i in range(0,100)})
pmf = pmf.groupby(pd.cut(pmf.index, counts.index)).sum()
pmf = pmf / pmf.sum()
counts_expected = pmf * N
stats.chisquare(counts,counts_expected, ddof=1)
# statistic=12.157245195139515, pvalue=0.35194047158664454
References
- 1.Jennings V, Lloyd-Smith B, Ironmonger D. Household size and the poisson distribution. Journal of Population Research. Published online May 1999:65-84. doi:10.1007/bf03029455
- 2.Laslett P. Size and structure of the household in England over three centuries. Population Studies. Published online July 1969:199-223. doi:10.1080/00324728.1969.10405278
- 3.Millard SP. EnvStats. Springer New York; 2013. doi:10.1007/978-1-4614-8456-1