Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization
Authors: Sang Michael Xie, Tengyu Ma, Percy Liang
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove for two-layer Re LU networks thatcomposedfine-tuningsignificantlyreducesthe complexity of the predictor, thus improving generalization. Empirically, we show that composed fine-tuning improves over standard fine-tuning on two pseudocode-to-code translation datasets (3% and 6% relative). |
| Researcher Affiliation | Academia | 1Department of Computer Science, Stanford University. Correspondence to: Sang Michael Xie <EMAIL>. |
| Pseudocode | No | The paper describes the proposed method and objective function mathematically but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | All data and code for reproducing the experiments are on our Coda Lab worksheet and Git Hub repository. |
| Open Datasets | Yes | We evaluate composed fine-tuning on two pseudocode-to-code datasets, SANSTYPE and SPOC (Kulal et al., 2019)... All data and code for reproducing the experiments are on our Coda Lab worksheet and Git Hub repository. |
| Dataset Splits | Yes | Out of the 6200 labeled examples (62 characters 100 fonts), we split randomly into 2500 training examples, 100 validation examples, and 3600 test examples. |
| Hardware Specification | No | The paper does not explicitly mention specific hardware details such as GPU or CPU models, memory, or specific cloud computing instances used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Transformers' as a model architecture, but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions). |
| Experiment Setup | Yes | In all models, we use weight decay, dropout, attention dropout, and Re LU dropout as regularization and use λ = 1 to balance between the fitting the composed and direct objectives. During inference, we use greedy decoding for simplicity... |