Cross-Modal Fine-Tuning: Align then Refine

Authors: Junhong Shen, Liam Li, Lucio M. Dery, Corey Staten, Mikhail Khodak, Graham Neubig, Ameet Talwalkar

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments, we show that ORCA obtains state-of-the-art results on 3 benchmarks containing over 60 datasets from 12 modalities, outperforming a wide range of hand-designed, Auto ML, general-purpose, and task-specific methods.
Researcher Affiliation Collaboration 1Carnegie Mellon University 2Hewlett Packard Enterprise.
Pseudocode Yes Algorithm 1 Efficient approximation of OTDD using class-wise subsampling.
Open Source Code Yes Our code is made public at https: //github.com/sjunhongshen/ORCA.
Open Datasets Yes Through extensive experiments, we show that ORCA obtains state-of-the-art results on 3 benchmarks containing over 60 datasets from 12 modalities...NAS-Bench-360 (Tu et al., 2022), PDEBench (Takamoto et al., 2022), and Open ML-CC18 (Vanschoren et al., 2014) which contain over 60 datasets from 12 distinct data modalities. ...We use Co NLL-2003 and CIFAR-10 as the proxy datasets, respectively.
Dataset Splits Yes For experiments, each dataset is preprocessed and split using the script available on https://github.com/rtu715/NAS-Bench-360, with the training set being used for hyperparameter tuning, embedding learning, and fine-tuning.
Hardware Specification Yes Experiments are performed on a single NVIDIA V100 GPU and managed using the Determined AI platform.
Software Dependencies No The paper mentions software like the Hugging Face transformers library, the OTDD implementation, and scikit-learn, but does not provide specific version numbers for these dependencies.
Experiment Setup Yes The configuration space for ASHA can be customized for each task. In general, the following search space is sufficient: Target sequence length: 8, 64, 512 for Ro BERTa; Batch size: 4, 16, 64; Gradient clipping: -1, 1; Dropout: 0, 0.05; Optimizer: SGD, Adam, Adam W; Learning rate: 1E-2, 1E-3, 1E-4, 1E-5; Weight decay: 0, 1E-2, 1E-4