The zero-truncated Poisson distribution is useful to model certain demographic data, such as sizes of households and families ^{1}. This post illustrates this on historical data from the United Kingdom and demonstrates how to assess the goodness-of-fit in Python.

The zero-truncated Poisson distribution \(Y\) is defined as

\begin{align}

\mathbf{P}\{Y=x\} = \mathbf{P}\{ X = x | X > 0\}, \quad X\sim \mathrm{Poiss}(\lambda) .

\end{align}

Conditioning which removes zero counts makes it a perfect fit for counting data that starts from 1. For example, consider the following historical UK data from a demographic research study ^{2}

The source data can be read into Python and visualized as follows

```
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
N = 98
counts = np.array([5.7,14.2,16.5,15.8,14.7,11.8,8.0,5.4,3.1,1.9,1.1,0.7,1.1]) / 100 * N
bins = pd.IntervalIndex.from_breaks(list(np.arange(1,13+1)) + [np.inf],closed="left",name="household size")
counts = pd.Series(data=counts,name="count",index=bins)
fig,ax = plt.subplots(figsize=(12,4))
sns.barplot(counts.reset_index(),x="household size",y="count",ax=ax)
ax.tick_params(axis='x', labelrotation=45)
```

Now, we hypothetize the truncated Python distribution with \(\lambda = 5\), and test the fit quality by adjusting the chi-square method. This chi-squared technique boils down to binning a reference distribution into finitely many categories, consistently with the empirical data, and is well-explained in R packages^{3}. We obtain a high p-value which confirms a good fit:

```
from scipy import stats
pmf = pd.Series({i:stats.poisson(5).pmf(i) for i in range(0,100)})
pmf = pmf.groupby(pd.cut(pmf.index, counts.index)).sum()
pmf = pmf / pmf.sum()
counts_expected = pmf * N
stats.chisquare(counts,counts_expected, ddof=1)
# statistic=12.157245195139515, pvalue=0.35194047158664454
```

### References

- 1.Jennings V, Lloyd-Smith B, Ironmonger D. Household size and the poisson distribution.
*Journal of Population Research*. Published online May 1999:65-84. doi:10.1007/bf03029455 - 2.Laslett P. Size and structure of the household in England over three centuries.
*Population Studies*. Published online July 1969:199-223. doi:10.1080/00324728.1969.10405278 - 3.Millard SP.
*EnvStats*. Springer New York; 2013. doi:10.1007/978-1-4614-8456-1