Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition

Authors: Daolang Huang, Xinyi Wen, Ayush Bharti, Samuel Kaski, Luigi Acerbi

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results on regression-based active learning, classical Bayesian experimental design benchmarks, and a psychometric model with selectively targeted parameters demonstrate that ALINE delivers both instant and accurate inference along with efficient selection of informative points.
Researcher Affiliation	Academia	1 ELLIS Institute Finland 2 Department of Computer Science, Aalto University, Finland 3 Department of Computer Science, University of Helsinki, Finland 4 Department of Computer Science, University of Manchester, UK EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 ALINE Training Procedure
Open Source Code	Yes	The code to reproduce our experiments is available at: https://github.com/huangdaolang/aline.
Open Datasets	Yes	For the active learning task, ALINE is trained on a diverse collection of fully synthetic functions drawn from Gaussian Process (GP) [62] priors (see Appendix C.1.1 for details). We test ALINE on two classical BED tasks: Location Finding [66] and Constant Elasticity of Substitution (CES) [3]... Our final experiment involves the psychometric modeling task [72]... We conduct a new set of experiments on actively exploring hyperparameter performance landscapes... We use high-dimensional, real-world tasks from the HPO-B benchmark [2], evaluating on rpart (6D), svm (8D), ranger (9D), and xgboost (16D) datasets.
Dataset Splits	No	All active learning experiments are evaluated with a candidate query pool consisting of 500 points. Each experimental run commenced with an initial context set consisting of a single data point. The target set size for predictive tasks is set to 100. (For HPO-B) build a surrogate model that accurately predicts performance for a larger, held-out set of target configurations.
Hardware Specification	Yes	All experiments presented in this work, encompassing model development, hyperparameter optimization, baseline evaluations, and preliminary analyses, are performed on a GPU cluster equipped with AMD MI250X GPUs.
Software Dependencies	No	The core code base is built using Pytorch (https://pytorch.org/, License: modified BSD license). For the Gaussian Process (GP) based baselines, we utilize Scikit-learn [56] (https://scikit-learn.org/, License: modified BSD license).
Experiment Setup	Yes	ALINE is trained using the Adam W optimizer with a weight decay of 0.01. The initial learning rate is set to 0.001 and decays according to a cosine annealing schedule. For both 1D and 2D input scenarios, ALINE is trained for 2 × 10^5 epochs using a batch size of 200. The discount factor γ for the policy gradient loss is set to 1. In the Location Finding task, the number of sequential design steps, T, is set to 30. For the CES task, each experimental run consists of T = 10 design steps. The ALINE is trained for 2 × 10^5 epochs with a batch size of 200.