Evaluating Bayesian Models with Posterior Dispersion Indices

Authors: Alp Kucukelbir, Yixin Wang, David M. Blei

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show how a PDI identifies patterns of model mismatch in three real data examples: voting preferences, supermarket shopping, and population genetics. and Section 3 presents an empirical study of model mismatch in three real-world examples: voting preferences, supermarket shopping, and population genetics.
Researcher Affiliation Academia 1Columbia University, New York City, USA.
Pseudocode Yes Algorithm 1: Calculating WAPDI.
Open Source Code Yes Figure 3. PDI and log predictive accuracy of each president under a mixture of three negative binomials model. Presidents sorted by PDI. The closer to zero, the better. (Code in supplement.) and Figure 4. Not all predictive probabilities are created equal. The translucent curves are the two likelihoods multiplied by the posterior (cropped). The posterior predictives p.x1 j x/ and p.x2 j x/ for each datapoint is the area under the curve. While both datapoints have the same predictive accuracy, the likelihood for x2 has higher variance under the posterior; x2 is more sensitive to the spread of the posterior than x1. WAPDI captures this effect. (Code in supplement.)
Open Datasets Yes In 1988, CBS conducted a U.S. nation-wide survey of voting preferences. for voting data, and Market research firm IRi hosts an anonymized dataset of customer shopping behavior at U.S. supermarkets (Bronnenberg et al., 2008). for supermarket data, and We study a dataset of N D 324 individuals from four geographic locations and focus on L D 13 928 locations on the genome. for population genetics.
Dataset Splits No The paper discusses cross-validation in the context of WAIC, stating 'WAIC D 1 N P n log .n/ C 2 log.n/: WAIC measures generalization error; it asymptotically equates to leave-one-one cross validation'. However, it does not provide specific train/validation/test dataset splits or cross-validation setup for its own experiments in the empirical study section.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments are mentioned in the paper.
Software Dependencies No The paper mentions using 'Stan' and 'automatic differentiation variational inference (ADVI)' but does not provide specific version numbers for these or any other software libraries, compilers, or operating systems.
Experiment Setup Yes We default to S D 1000 in our experiments. and Set the prior on to match the mean and variance of the data (Robbins, 1964). Choose an uninformative prior on . Three mixtures make sense: two for the typical trends and one for the rest. and A 20-dimensional HPF model discovers intuitive trends. and Figure 5 shows how these individuals mix K D 3 ancestral populations.