reproducibilityindex.ai

Cross-Modal Fine-Tuning: Align then Refine

Authors: Junhong Shen, Liam Li, Lucio M. Dery, Corey Staten, Mikhail Khodak, Graham Neubig, Ameet Talwalkar

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, we show that ORCA obtains state-of-the-art results on 3 benchmarks containing over 60 datasets from 12 modalities, outperforming a wide range of hand-designed, Auto ML, general-purpose, and task-specific methods.
Researcher Affiliation	Collaboration	1Carnegie Mellon University 2Hewlett Packard Enterprise.
Pseudocode	Yes	Algorithm 1 Efficient approximation of OTDD using class-wise subsampling.
Open Source Code	Yes	Our code is made public at https: //github.com/sjunhongshen/ORCA.
Open Datasets	Yes	Through extensive experiments, we show that ORCA obtains state-of-the-art results on 3 benchmarks containing over 60 datasets from 12 modalities...NAS-Bench-360 (Tu et al., 2022), PDEBench (Takamoto et al., 2022), and Open ML-CC18 (Vanschoren et al., 2014) which contain over 60 datasets from 12 distinct data modalities. ...We use Co NLL-2003 and CIFAR-10 as the proxy datasets, respectively.
Dataset Splits	Yes	For experiments, each dataset is preprocessed and split using the script available on https://github.com/rtu715/NAS-Bench-360, with the training set being used for hyperparameter tuning, embedding learning, and fine-tuning.
Hardware Specification	Yes	Experiments are performed on a single NVIDIA V100 GPU and managed using the Determined AI platform.
Software Dependencies	No	The paper mentions software like the Hugging Face transformers library, the OTDD implementation, and scikit-learn, but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	The configuration space for ASHA can be customized for each task. In general, the following search space is sufficient: Target sequence length: 8, 64, 512 for Ro BERTa; Batch size: 4, 16, 64; Gradient clipping: -1, 1; Dropout: 0, 0.05; Optimizer: SGD, Adam, Adam W; Learning rate: 1E-2, 1E-3, 1E-4, 1E-5; Weight decay: 0, 1E-2, 1E-4