Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization
Authors: Sang Michael Xie, Tengyu Ma, Percy Liang
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove for two-layer Re LU networks thatcomposedfine-tuningsignificantlyreducesthe complexity of the predictor, thus improving generalization. Empirically, we show that composed fine-tuning improves over standard fine-tuning on two pseudocode-to-code translation datasets (3% and 6% relative). |
| Researcher Affiliation | Academia | 1Department of Computer Science, Stanford University. Correspondence to: Sang Michael Xie <xie@cs.stanford.edu>. |
| Pseudocode | No | The paper describes the proposed method and objective function mathematically but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | All data and code for reproducing the experiments are on our Coda Lab worksheet and Git Hub repository. |
| Open Datasets | Yes | We evaluate composed fine-tuning on two pseudocode-to-code datasets, SANSTYPE and SPOC (Kulal et al., 2019)... All data and code for reproducing the experiments are on our Coda Lab worksheet and Git Hub repository. |
| Dataset Splits | Yes | Out of the 6200 labeled examples (62 characters 100 fonts), we split randomly into 2500 training examples, 100 validation examples, and 3600 test examples. |
| Hardware Specification | No | The paper does not explicitly mention specific hardware details such as GPU or CPU models, memory, or specific cloud computing instances used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Transformers' as a model architecture, but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions). |
| Experiment Setup | Yes | In all models, we use weight decay, dropout, attention dropout, and Re LU dropout as regularization and use λ = 1 to balance between the fitting the composed and direct objectives. During inference, we use greedy decoding for simplicity... |