Foundation Posteriors for Approximate Probabilistic Inference
Authors: Mike Wu, Noah Goodman
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show the efficacy of the approach, zero-shot and fine-tuned, on a benchmark of STAN programs.In experiments, we find the foundation posterior to be capable of both zero-shot inference and variational fine-tuning: given a program from the test set, we can achieve higher quality using the foundation posterior as an initial distribution. |
| Researcher Affiliation | Academia | Mike Wu, Noah Goodman Department of Computer Science Stanford University Stanford, CA 94305 {wumike, ngoodman}@stanford.edu |
| Pseudocode | No | The paper describes the Masked Language Inference (MLI) procedure in text but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the methodology or provide a link to a code repository. |
| Open Datasets | Yes | To demonstrate the foundation posterior, we meta-amortize inference over a set of standard Stan [12] programs from Posterior DB [43], a benchmark dataset for evaluating inference algorithms [4, 5, 69, 70, 20]. [43] Mans Magnusson, Paul Burkner, and Aki Vehtari. posteriordb: a set of posteriors for bayesian inference and probabilistic programming. 2021. |
| Dataset Splits | Yes | We build a test set with 1,000 new executions of the program not used in training and randomly mask assignments. However, now we use five of them for meta-training, and hold out the Rosenbrock program for meta-test. We hold out three programs from Posterior DB for evaluation, and optimize the foundation posterior on the remaining set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instance types) used to run the experiments. |
| Software Dependencies | No | The paper mentions software like 'STAN programs' and 'Cmd Stan Py' but does not provide specific version numbers for any software dependencies used in their implementation or experiments. |
| Experiment Setup | Yes | While Equation 2 only masked a single token per loss, in practice we randomly mask 15% of tokens in xmlm, and mask an increasing amount of assignments in xinf according to a schedule: we begin at 15% but increase this masking probability throughout training to 50%, thereby increasing the difficulty of inference. Plating with minibatches of size 5 is used for all programs to fit observations within the transformer s 512 token limit. After pretraining, we optimize Equation 4 for each test program individually, varying the number of steps of fine-tuning across 0 (zero-shot), 10, 100, and 1000. |