reproducibilityindex.ai

Foundation Posteriors for Approximate Probabilistic Inference

Authors: Mike Wu, Noah Goodman

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show the efﬁcacy of the approach, zero-shot and ﬁne-tuned, on a benchmark of STAN programs.In experiments, we ﬁnd the foundation posterior to be capable of both zero-shot inference and variational ﬁne-tuning: given a program from the test set, we can achieve higher quality using the foundation posterior as an initial distribution.
Researcher Affiliation	Academia	Mike Wu, Noah Goodman Department of Computer Science Stanford University Stanford, CA 94305 {wumike, ngoodman}@stanford.edu
Pseudocode	No	The paper describes the Masked Language Inference (MLI) procedure in text but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code for the methodology or provide a link to a code repository.
Open Datasets	Yes	To demonstrate the foundation posterior, we meta-amortize inference over a set of standard Stan [12] programs from Posterior DB [43], a benchmark dataset for evaluating inference algorithms [4, 5, 69, 70, 20]. [43] Mans Magnusson, Paul Burkner, and Aki Vehtari. posteriordb: a set of posteriors for bayesian inference and probabilistic programming. 2021.
Dataset Splits	Yes	We build a test set with 1,000 new executions of the program not used in training and randomly mask assignments. However, now we use ﬁve of them for meta-training, and hold out the Rosenbrock program for meta-test. We hold out three programs from Posterior DB for evaluation, and optimize the foundation posterior on the remaining set.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instance types) used to run the experiments.
Software Dependencies	No	The paper mentions software like 'STAN programs' and 'Cmd Stan Py' but does not provide specific version numbers for any software dependencies used in their implementation or experiments.
Experiment Setup	Yes	While Equation 2 only masked a single token per loss, in practice we randomly mask 15% of tokens in xmlm, and mask an increasing amount of assignments in xinf according to a schedule: we begin at 15% but increase this masking probability throughout training to 50%, thereby increasing the difﬁculty of inference. Plating with minibatches of size 5 is used for all programs to ﬁt observations within the transformer s 512 token limit. After pretraining, we optimize Equation 4 for each test program individually, varying the number of steps of ﬁne-tuning across 0 (zero-shot), 10, 100, and 1000.