Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Autoencoding Random Forests

Authors: Binh Vu, Jan Kapar, Marvin Wright, David Watson

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments illustrate the ease and utility of our method in a wide range of settings, including tabular, image, and genomic data.
Researcher Affiliation	Academia	1King s College London 2Leibniz Institute for Prevention Research and Epidemiology BIPS 3University of Bremen
Pseudocode	Yes	Alg. 1 provides an overview of the greedy leaf assignment procedure.
Open Source Code	Yes	Code for reproducing all results is available online.4 The method is implemented as a package in R (see previous footnote), as well as in Python5. 4https://github.com/bips-hb/RFAE. 5https://github.com/binhducvu/RFAE_py.
Open Datasets	Yes	As a preliminary proof of concept, we visualize the embeddings of a RF classifier with 200 trees as it trains on a subset of the MNIST dataset [30]... For the compression reconstruction benchmark, we use 20 datasets sourced from the UCI Machine Learning Repository [33], Open ML [94], Kaggle, and the R palmerpenguins [57] package.
Dataset Splits	Yes	For each dataset, we take ten different bootstrap samples, to form ten training sets, and use the remaining out-of-bag data as the testing set.
Hardware Specification	Yes	To meet these computational demands, we run these experiments from a high-performance computing partition, with 12 AMD EPYC 7282 CPUs, 64GB RAM, and an NVIDIA A30 graphics card. These high-performance computing units were used as part of King s College London s CREATE HPC [67]. These experiments are conducted on a laptop with Intel Core i5-10300H CPU, NVIDIA Ge Force GTX 1650 (4GB), and 24GB DDR4 RAM. These were run on a HPC unit with an AMD Threadripper 3960X (24 cores, 48 threads) CPU and 256GB RAM, with no GPU unit used for the experiment.
Software Dependencies	No	RFs are trained either using the ranger package [101], or the arf package [99], which also returns a trained forest of class ranger. Truncated eigendecompositions are computed using RSpectra. Memory-efficient methods for sparse matrices are made possible with the Matrix package. We use the RANN package for fast k-NN regression with kd-trees. For standard lasso, we use the glmnet package [36], and Exclusive Lasso for the exclusive variant [19].
Experiment Setup	Yes	Autoencoder: We use an MLP-based autoencoder for this benchmark... Our structure then is: Input(d X ) Dense(d X (d X d Z) 1/3) Dense(d X (d Z d Z) 2/3) Latent(d Zd Z) Dense(d X (d X d Z) 2/3) Dense(d X (d X d Z) 1/3) Output(d X ). If d X = 8, d Z = 2 then with this rule the network will be 8 6 4 2 4 6 8. For hyperparameters, we use common defaults: epochs = 50, optimizer = ADAM. We use Re LU activations at hidden layers, a sigmoid for the output, and a random 10% validation set from the training data. Variational Autoencoder: Similar to the autoencoder, we use an MLP-based variational autoencoder. For comparison, we mimic the autoencoder s architecture, activation function, and defaults, only changing epochs = 100 and adding batch_size = 32. We also impose a β coefficient inspired from [53], at a value of 0.1. For RFAE, we use the unsupervised ARF algorithm [99] with 500 trees and set k = 20 for decoding.