Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Stochastic Encodings for Active Feature Acquisition

Authors: Alexander Luke Ian Norcliffe, Changhee Lee, Fergus Imrie, Mihaela Van Der Schaar, Pietro Lio

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluation on a large range of synthetic and real datasets demonstrates that our approach reliably outperforms a diverse set of baselines. We evaluate SEFA on multiple synthetic and real-world datasets, including cancer classification tasks. Comparing against various state-of-the-art AFA baselines, we see that SEFA consistently outperforms these methods. Extensive ablations further demonstrate each novel design choice is required for the best performance.
Researcher Affiliation	Academia	1 Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom 2 Department of Artificial Intelligence, Korea University, Seoul, Korea 3 Department of Statistics, University of Oxford, Oxford, United Kingdom 4 Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, United Kingdom.
Pseudocode	Yes	We also provide pseudo-code for the loss calculation for batch size 1 in Algorithm 1. ... We provide pseudo-code for scoring a single feature in Algorithm 2
Open Source Code	Yes	The code for our method and experiments is available at https://github.com/a-norcliffe/SEFA.
Open Datasets	Yes	Bank Marketing. The Bank Marketing dataset (Moro et al., 2014) can be found at: https://archive.ics.uci. edu/dataset/222/bank+marketing. California Housing. The California Housing dataset is obtained through Scikit-Learn (Pedregosa et al., 2011) https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_ california_housing.html. Mini Boo NE. The data was obtained from https://archive.ics.uci. edu/dataset/199/miniboone+particle+identification. MNIST and Fashion MNIST are image classification datasets... METABRIC. The Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) database ... The data was accessed at https://www.kaggle.com/datasets/raghadalharbi/ breast-cancer-gene-expression-profiles-metabric. TCGA. The Cancer Genome Atlas (TCGA)... The data was accessed at https://www.cancer.gov/ccg/research/genome-sequencing/tcga.
Dataset Splits	Yes	The train set is size 60,000, and the validation and test sets are both size 10,000. ... We use an 80:10:10 split, giving train, validation, and test sizes of 36,168, 4,521, and 4,522. ... We use an 80:10:10 split, giving train, validation, and test sizes of 16,512, 2,064, and 2,064. ... For both datasets, we split the provided train set into a train set with size 50,000 and validation set with size 10,000. We use the provided test sets, each with size 10,000. ... We use an 80:10:10 split, resulting in train, validation, and test sizes of 1,518, 189, and 191. ... We then removed subjects with more than 10% missing features and used an 80:10:10 split. This gave train, validation, and test sizes of 6,327, 790, and 792.
Hardware Specification	Yes	All experiments were run on an Nvidia Quadro RTX 8000 GPU. The specifications can be found at https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/ quadro-product-literature/quadro-rtx-8000-us-nvidia-946977-r1-web.pdf.
Software Dependencies	No	All models were implemented using Py Torch (Paszke et al., 2017); code is available at https://github.com/ a-norcliffe/SEFA. ... The Adam optimizer (Kingma & Ba, 2015) ... Batch Normalization (Ioffe & Szegedy, 2015). ... The California Housing dataset is obtained through Scikit-Learn (Pedregosa et al., 2011).
Experiment Setup	Yes	We train all models using the Adam optimizer (Kingma & Ba, 2015), the learning rate and batch size are hyperparameters that are tuned using a validation set. All methods (except for Opportunistic RL which uses its original implementation) use a learning rate scheduler that multiplies the learning rate by 0.2 when there have been a set number of epochs without validation metric improvement the patience, which is also tuned. ... We prevent overfitting during training by tracking a validation metric every epoch and using the model parameters that produce the best value. ... For every model, initial hyperparameter tuning was conducted by finding ranges for each hyperparameter that produced strong acquisition performance on the synthetic datasets. ... The nine configurations for each method are provided in Tables 8, 9, 10, 11, 12, 13, 14, 15 and 16. We give the selected hyperparameter configurations for each dataset in Table 17.