Cross-Modal Fine-Tuning: Align then Refine
Authors: Junhong Shen, Liam Li, Lucio M. Dery, Corey Staten, Mikhail Khodak, Graham Neubig, Ameet Talwalkar
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments, we show that ORCA obtains state-of-the-art results on 3 benchmarks containing over 60 datasets from 12 modalities, outperforming a wide range of hand-designed, Auto ML, general-purpose, and task-specific methods. |
| Researcher Affiliation | Collaboration | 1Carnegie Mellon University 2Hewlett Packard Enterprise. |
| Pseudocode | Yes | Algorithm 1 Efficient approximation of OTDD using class-wise subsampling. |
| Open Source Code | Yes | Our code is made public at https: //github.com/sjunhongshen/ORCA. |
| Open Datasets | Yes | Through extensive experiments, we show that ORCA obtains state-of-the-art results on 3 benchmarks containing over 60 datasets from 12 modalities...NAS-Bench-360 (Tu et al., 2022), PDEBench (Takamoto et al., 2022), and Open ML-CC18 (Vanschoren et al., 2014) which contain over 60 datasets from 12 distinct data modalities. ...We use Co NLL-2003 and CIFAR-10 as the proxy datasets, respectively. |
| Dataset Splits | Yes | For experiments, each dataset is preprocessed and split using the script available on https://github.com/rtu715/NAS-Bench-360, with the training set being used for hyperparameter tuning, embedding learning, and fine-tuning. |
| Hardware Specification | Yes | Experiments are performed on a single NVIDIA V100 GPU and managed using the Determined AI platform. |
| Software Dependencies | No | The paper mentions software like the Hugging Face transformers library, the OTDD implementation, and scikit-learn, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | The configuration space for ASHA can be customized for each task. In general, the following search space is sufficient: Target sequence length: 8, 64, 512 for Ro BERTa; Batch size: 4, 16, 64; Gradient clipping: -1, 1; Dropout: 0, 0.05; Optimizer: SGD, Adam, Adam W; Learning rate: 1E-2, 1E-3, 1E-4, 1E-5; Weight decay: 0, 1E-2, 1E-4 |