Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Training the Untrainable: Introducing Inductive Bias via Representational Alignment

Authors: Vighnesh Subramaniam, David Mayo, Colin Conwell, Tomaso A Poggio, Boris Katz, Brian Cheung, Andrei Barbu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that architectures which traditionally are considered to be ill-suited for a task can be trained using inductive biases from another architecture. ... We show that guidance prevents FCN overfitting on Image Net, narrows the vanilla RNN Transformer gap, boosts plain CNNs toward Res Net accuracy, and aids Transformers on RNN-favored tasks. ...We design several settings with different target and guide networks to thoroughly test our approach. We include a range of image and sequence modeling tasks.
Researcher Affiliation	Academia	1MIT CSAIL, CBMM 2Department of Cognitive Science, Johns Hopkins University 1EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 Guidance: Guide Network Representational Alignment
Open Source Code	Yes	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We upload code associated with all experiments in this paper. Furthermore, we also describe all architectures in detail, include all hyperparameters used in the paper such as batch size, learning rate, and optimizer, and cover number of training steps. See our appendices or section 4 where we cover our training setting. We run all experiments with open-source datasets that are widely available or can be easily generated.
Open Datasets	Yes	Finally, we consider a language modeling task using the Wiki Text-103 dataset [54] where models must predict the next token given some context. ...For an image-based task, we focus on image classification and use the Image Net-1K dataset [19] for training and testing.
Dataset Splits	Yes	We generate a total of 100,000 examples, training on 80,000 examples, validating on 10,000 examples, and testing on 10,000 examples. ...This uses the train, validation and testing splits defined by the Wiki Text dataset and for all experiments, we use a context length of 50. ...We use the splits defined by the dataset.
Hardware Specification	Yes	The training experiments in this paper were completed across 4 H100s and 4 A100 GPUs for 3 weeks in total.
Software Dependencies	No	The paper mentions specific optimizers (Adam W [52], Adam [43]) but does not provide version numbers for any libraries, programming languages, or other key software components used for implementation.
Experiment Setup	Yes	For all sequence modeling tasks, i.e. copy-paste, parity, and language modeling we use Adam W [52]. For language modeling, we also incorporate gradient clipping due to unstable training with long sequences. When training networks for image classification using Image Net-1K, we use the Adam [43] optimizer. To ensure consistency of comparisons across learning curves, we use a consistent batch size of 256. ...After choosing the optimal learning rate, we then train all networks and settings for 100 epochs with 5 random seeds to compute error bars.