Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Amortized Bayesian Experimental Design for Decision-Making
Authors: Daolang Huang, Yujia Guo, Luigi Acerbi, Samuel Kaski
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the performance of our method across several tasks, showing that it can deliver informative designs and facilitate accurate decision-making. |
| Researcher Affiliation | Academia | Daolang Huang Aalto University EMAIL Yujia Guo Aalto University EMAIL Luigi Acerbi University of Helsinki EMAIL Samuel Kaski Aalto University University of Manchester EMAIL |
| Pseudocode | Yes | Algorithm 1 Transformer Neural Decision Processes (TNDP) |
| Open Source Code | Yes | The code to reproduce our experiments is available at https://github.com/huangdaolang/amortized-decision-aware-bed. |
| Open Datasets | Yes | We use the synthetic dataset from Filstroff et al. (2024), the details of the data generating process can be found in Appendix F. |
| Dataset Splits | No | The paper mentions training, but does not explicitly state the train/validation/test split percentages or counts. For example, in Section 6.3, it states: "All results are evaluated on a predefined test set, ensuring that TNDP does not encounter these test sets during training." However, a specific validation split is not detailed. |
| Hardware Specification | Yes | All experiments are evaluated on an Intel Core i7-12700K CPU. ... Throughout this paper, we carried out all experiments, including baseline model computations and preliminary experiments not included in the final paper, on a GPU cluster featuring a combination of Tesla P100, Tesla V100, and Tesla A100 GPUs. ... For each experiment, it takes around 10 GPU hours on a Tesla V100 GPU with 32GB memory to reproduce the result, with an average memory consumption of 8 GB. |
| Software Dependencies | Yes | We utilize the official Transformer Encoder layer of Py Torch (Paszke et al., 2019) (https://pytorch.org) for our transformer architecture. |
| Experiment Setup | Yes | For all experiments, we use the same configuration to train our model. We set the initial learning rate to 5e-4 and employ the cosine annealing learning rate scheduler. The number of training epochs is set to 50,000 for top-k tasks and 100,000 for other tasks, and the batch size is 16. For the REINFORCE, we use a discount factor of α = 0.99. |