Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Structured Voronoi Sampling
Authors: Afra Amini, Li Du, Ryan Cotterell
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In an experimental setup where the reference distribution is known, we show that the empirical distribution of SVS samples is closer to the reference distribution compared to alternative sampling schemes. |
| Researcher Affiliation | Academia | Afra Amini1 Li Du2 Ryan Cotterell1 1ETH Zรผrich 2Johns Hopkins University EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 HMC, Algorithm 2 Langevin Dynamics, Algorithm 3 MUCOLA, Algorithm 4 Structured Voronoi Sampling, Algorithm 5 REFRACTREFLECT, Algorithm 6 Find Discontinuity |
| Open Source Code | Yes | https://github.com/Afra Amini/svs |
| Open Datasets | Yes | The underlying LM is a finetuned GPT-210 on E2E dataset [34]; see App. G for dataset statistics. This dataset is made available under the CC BY-SA 4.0 license. |
| Dataset Splits | Yes | Table 2: Number of restaurant reviews in each split and food type. train 2929... valid 1489... test 492... |
| Hardware Specification | Yes | All experiments are done on a single A100-40GB GPU. All classifiers are trained and tested on a single gtx_1080_ti GPU with approximately 2 hours of total computational budget. |
| Software Dependencies | No | The paper mentions using a 'gpt2 checkpoint from the Huggingface library [47]' but does not specify version numbers for the Huggingface library itself or other core software dependencies like Python or PyTorch, which would be necessary for full reproducibility. |
| Experiment Setup | Yes | Hyperparameters for each experiment are reported in Table 4. Following prior work [46], in algorithms based on Langevin dynamics, we apply an exponential decay to the step size by decreasing it to 0.05 after 500 steps. In all settings, we take 500 burn-in steps. |