Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Random Forest Autoencoders for Guided Representation Learning
Authors: Adrien Aumon, Shuang Ni, Myriam Lizotte, Guy Wolf, Kevin J. Moon, Jake Rhodes
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments, we show that RF-AE outperforms existing approaches in embedding new data while preserving the local and global structure of the important features for the underlying classification task. We assessed the trade-off between SIA and k-NN classification accuracy achieved by RF-AE against several baseline methods across 20 datasets spanning diverse domains. |
| Researcher Affiliation | Academia | 1Université de Montréal; Mila Quebec AI Institute 2Utah State University 3Brigham Young University |
| Pseudocode | Yes | Algorithm S1: Feature-wise data perturbation with random sampling |
| Open Source Code | Yes | Our code is available at https://github.com/Jake SRhodes Lab/RF-AE. |
| Open Datasets | Yes | Sign MNIST (A K) [71], MNIST (test subset) [72], Fashion MNIST (test subset) [73], GTZAN (3-second version) [74] and USPS [75] were obtained from Kaggle. ... Blood MNIST and Organ C MNIST (Med MNIST family [76, 77]) were obtained from Zenodo. All other datasets are publicly available from the UCI Machine Learning Repository. |
| Dataset Splits | Yes | Training and OOS embeddings were generated using an 80/20 stratified train/test split, except for Isolet, Landsat Satellite, Optical Digits, USPS, HAR, Organ C MNIST and Blood MNIST, where predefined splits were used. |
| Hardware Specification | Yes | For models requiring GPU acceleration, we used: 1 GPU with at least 40 GB of memory (e.g., NVIDIA A100 40GB, H100 80GB, or equivalent), 1 CPU with 128 GB of RAM. |
| Software Dependencies | No | RF-AE: Implemented in PyTorch. ... PCA, NCA, and PLS-DA: Implemented using the scikit-learn library [70]. |
| Experiment Setup | Yes | RF-AE: Implemented in PyTorch. The encoder f consisted of three hidden layers with sizes 800, 400, and 100... Training was performed using the Adam W optimizer [78] with a learning rate of 10 3, batch size of 512, weight decay of 10 5, and 200 epochs without early stopping. We set the default λ and N to 0.01 and 0.1Ntrain, respectively. |