Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization

Authors: Sang Michael Xie, Tengyu Ma, Percy Liang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We prove for two-layer Re LU networks thatcomposedfine-tuningsignificantlyreducesthe complexity of the predictor, thus improving generalization. Empirically, we show that composed fine-tuning improves over standard fine-tuning on two pseudocode-to-code translation datasets (3% and 6% relative).
Researcher Affiliation Academia 1Department of Computer Science, Stanford University. Correspondence to: Sang Michael Xie <xie@cs.stanford.edu>.
Pseudocode No The paper describes the proposed method and objective function mathematically but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes All data and code for reproducing the experiments are on our Coda Lab worksheet and Git Hub repository.
Open Datasets Yes We evaluate composed fine-tuning on two pseudocode-to-code datasets, SANSTYPE and SPOC (Kulal et al., 2019)... All data and code for reproducing the experiments are on our Coda Lab worksheet and Git Hub repository.
Dataset Splits Yes Out of the 6200 labeled examples (62 characters 100 fonts), we split randomly into 2500 training examples, 100 validation examples, and 3600 test examples.
Hardware Specification No The paper does not explicitly mention specific hardware details such as GPU or CPU models, memory, or specific cloud computing instances used for running the experiments.
Software Dependencies No The paper mentions using 'Transformers' as a model architecture, but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions).
Experiment Setup Yes In all models, we use weight decay, dropout, attention dropout, and Re LU dropout as regularization and use λ = 1 to balance between the fitting the composed and direct objectives. During inference, we use greedy decoding for simplicity...