Robust Probabilistic Modeling with Bayesian Data Reweighting
Authors: Yixin Wang, Alp Kucukelbir, David M. Blei
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Section 3 presents these intuitions in full detail, along with theoretical corroboration. In Section 3, we study four models under various forms of mismatch with reality, including missing modeling assumptions, misspecified nonlinearities, and skewed data. Section 4 presents a recommendation system example, where we improve on predictive performance and identify atypical film enthusiasts in the Movielens 1M dataset. |
| Researcher Affiliation | Academia | Yixin Wang 1 Alp Kucukelbir 1 David M. Blei 1 1Columbia University, New York City, USA. Correspondence to: Yixin Wang <yixin.wang@columbia.edu>. |
| Pseudocode | No | The paper describes the steps of its proposed reweighted probabilistic models (RPM) and discusses inference algorithms like NUTS and ADVI. However, it does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statement about releasing source code or provide a link to a code repository for the methodology described. |
| Open Datasets | Yes | We use the Movie Lens 1M dataset, which contains one million movie ratings from 6 000 users on 4 000 movies. |
| Dataset Splits | No | The paper mentions 'held out data' and 'held-out log likelihood' and uses 'training dataset' and 'test data', but it does not explicitly provide specific dataset split percentages or sample counts for training, validation, and test sets across all experiments to ensure reproducibility of data partitioning. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions using 'automated inference in Stan' (Carpenter et al., 2015) and leveraging 'variational inference (Kucukelbir et al., 2017)'. However, it does not provide specific version numbers for Stan or any other software libraries or dependencies used, which is necessary for reproducibility. |
| Experiment Setup | Yes | We let the shape parameter a scale with the data size N such that N=a 103; this encodes a mild attitude towards unit weights." (Section 3.1) and "We model each measurement using a Poisson likelihood .yn j ˇ/ D Poisson.ˇ/ and posit a Gamma prior on the rate pˇ.ˇ/ D Gam.a D 2; b D 0:5/." (Section 3.1) and "Posit a prior on the slope as pˇ.ˇ/ D N .0; 10/ and assume a Beta.0:1; 0:01/ prior on the weights." (Section 3.2) and "We simulate three clusters from two-dimensional skewnormal distributions and fit a GMM with maximum K D 30." (Section 3.5) and "We place iid Gamma.1; 0:001/ priors on the preferences and attributes. Here, we have the option of reweighting users or items. We focus on users and place a Beta.100; 1/ prior on their weights." (Section 4) |