Hamiltonian Monte Carlo Inference of Marginalized Linear Mixed-Effects Models

Authors: Jinlin Lai, Justin Domke, Daniel R. Sheldon

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on a variety of real LMMs from past scientific investigations, including nine models and datasets from cognitive sciences, and find that marginalization is always beneficial. Our findings suggest that practitioners should marginalize group-level effects whenever applicable in Bayesian hierarchical inference.
Researcher Affiliation Academia Jinlin Lai, Justin Domke, Daniel Sheldon Manning College of Information and Computer Sciences University of Massachusetts Amherst {jinlinlai,domke,sheldon}@cs.umass.edu
Pseudocode Yes Algorithm 1 Evaluating log p(y|Θ, u i). ... Algorithm 2 Sampling from p(ui|Θ, y, u i)
Open Source Code Yes The code is available at https://github.com/lll6924/hamiltonian_lme.git.
Open Datasets Yes We evaluate our approach on a variety of real LMMs from past scientific investigations, including nine models and datasets from cognitive sciences... The dataset [4] contains observations y of the the number of ticks on the heads of red grouse chicks in the field.
Dataset Splits No The paper mentions '10,000 warm up samples for tuning, and 100,000 samples for evaluation' for HMC, which implies data usage, but it does not explicitly define traditional training, validation, and test splits for the datasets.
Hardware Specification Yes Experiments are run on NVIDIA A40. ... GPU models run on an NVIDIA RTX 2080ti GPU. CPU models run on one Intel Xeon Gold 6148 processor.
Software Dependencies No The paper mentions 'Num Pyro [5, 54]', 'default no-U-turn sampler (NUTS) [34]', and 'Tensorflow probability [13]', but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes For the ETH instructor evaluation model, we set the maximum tree depth to 12 to overcome difficulties performing inference without marginalization in preliminary experiments. For all models, we use weakly informative priors unless specified. In general, our conclusion is insensitive to the choice of hyperparameters and priors. For all experiments, we collect 10,000 warm up samples for tuning, and 100,000 samples for evaluation.