Demographic Data with Zero-Truncated Poisson

The zero-truncated Poisson distribution is useful to model certain demographic data, such as sizes of households and families ​1​. This post illustrates this on historical data from the United Kingdom and demonstrates how to assess the goodness-of-fit in Python.

The zero-truncated Poisson distribution \(Y\) is defined as
\begin{align}
\mathbf{P}\{Y=x\} = \mathbf{P}\{ X = x | X > 0\}, \quad X\sim \mathrm{Poiss}(\lambda) .
\end{align}
Conditioning which removes zero counts makes it a perfect fit for counting data that starts from 1. For example, consider the following historical UK data from a demographic research study ​2​

The source data can be read into Python and visualized as follows

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

N = 98

counts = np.array([5.7,14.2,16.5,15.8,14.7,11.8,8.0,5.4,3.1,1.9,1.1,0.7,1.1]) / 100 * N
bins = pd.IntervalIndex.from_breaks(list(np.arange(1,13+1)) + [np.inf],closed="left",name="household size")
counts = pd.Series(data=counts,name="count",index=bins)

fig,ax = plt.subplots(figsize=(12,4))
sns.barplot(counts.reset_index(),x="household size",y="count",ax=ax)
ax.tick_params(axis='x', labelrotation=45)

Now, we hypothetize the truncated Python distribution with \(\lambda = 5\), and test the fit quality by adjusting the chi-square method. This chi-squared technique boils down to binning a reference distribution into finitely many categories, consistently with the empirical data, and is well-explained in R packages​3​. We obtain a high p-value which confirms a good fit:

from scipy import stats

pmf = pd.Series({i:stats.poisson(5).pmf(i) for i in range(0,100)})
pmf = pmf.groupby(pd.cut(pmf.index, counts.index)).sum()
pmf = pmf / pmf.sum()

counts_expected = pmf * N

stats.chisquare(counts,counts_expected, ddof=1)
# statistic=12.157245195139515, pvalue=0.35194047158664454

References

  1. 1.
    Jennings V, Lloyd-Smith B, Ironmonger D. Household size and the poisson distribution. Journal of Population Research. Published online May 1999:65-84. doi:10.1007/bf03029455
  2. 2.
    Laslett P. Size and structure of the household in England over three centuries. Population Studies. Published online July 1969:199-223. doi:10.1080/00324728.1969.10405278
  3. 3.
    Millard SP. EnvStats. Springer New York; 2013. doi:10.1007/978-1-4614-8456-1

Published by mskorski

Scientist, Consultant, Learning Enthusiast

Leave a comment

Your email address will not be published. Required fields are marked *