Foundation Posteriors for Approximate Probabilistic Inference

Authors: Mike Wu, Noah Goodman

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show the efficacy of the approach, zero-shot and fine-tuned, on a benchmark of STAN programs.In experiments, we find the foundation posterior to be capable of both zero-shot inference and variational fine-tuning: given a program from the test set, we can achieve higher quality using the foundation posterior as an initial distribution.
Researcher Affiliation Academia Mike Wu, Noah Goodman Department of Computer Science Stanford University Stanford, CA 94305 {wumike, ngoodman}@stanford.edu
Pseudocode No The paper describes the Masked Language Inference (MLI) procedure in text but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code for the methodology or provide a link to a code repository.
Open Datasets Yes To demonstrate the foundation posterior, we meta-amortize inference over a set of standard Stan [12] programs from Posterior DB [43], a benchmark dataset for evaluating inference algorithms [4, 5, 69, 70, 20]. [43] Mans Magnusson, Paul Burkner, and Aki Vehtari. posteriordb: a set of posteriors for bayesian inference and probabilistic programming. 2021.
Dataset Splits Yes We build a test set with 1,000 new executions of the program not used in training and randomly mask assignments. However, now we use five of them for meta-training, and hold out the Rosenbrock program for meta-test. We hold out three programs from Posterior DB for evaluation, and optimize the foundation posterior on the remaining set.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instance types) used to run the experiments.
Software Dependencies No The paper mentions software like 'STAN programs' and 'Cmd Stan Py' but does not provide specific version numbers for any software dependencies used in their implementation or experiments.
Experiment Setup Yes While Equation 2 only masked a single token per loss, in practice we randomly mask 15% of tokens in xmlm, and mask an increasing amount of assignments in xinf according to a schedule: we begin at 15% but increase this masking probability throughout training to 50%, thereby increasing the difficulty of inference. Plating with minibatches of size 5 is used for all programs to fit observations within the transformer s 512 token limit. After pretraining, we optimize Equation 4 for each test program individually, varying the number of steps of fine-tuning across 0 (zero-shot), 10, 100, and 1000.