Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation
Authors: Lincan Cai, Shuang Li, Wenxuan Ma, Jingxuan Kang, Binhui Xie, Zixun Sun, Chengwei Zhu
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Compared with hand-designed, general-purpose, task-specific, and state-of-the-art cross-modal fine-tuning approaches, Pa Re demonstrates superior performance across three challenging benchmarks, encompassing more than ten modalities. |
| Researcher Affiliation | Collaboration | 1Beijing Institute of Technology 2University of Illinois Urbana Champaign 3Interactive Entertainment Group, Tencent. |
| Pseudocode | Yes | We summarize our Pa Re in Alg. 1 in the Appendix A.1. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing code or a link to a code repository. |
| Open Datasets | Yes | For 2D classification tasks, CIFAR10 (Krizhevsky et al., 2009) and Tiny-Image Net (Le & Yang, 2015) serve as proxy datasets. For 2D dense prediction tasks, we use VOC (Everingham et al., 2015) as a proxy dataset... For 1D tasks, Co NLL-2003 is employed as a proxy dataset. We validate Pa Re for cross-modal fine-tuning on three benchmarks: NASBench-360, PDEBench and Open ML-CC18, comprising a total of 48 datasets. |
| Dataset Splits | No | The paper mentions training and test sets but does not explicitly mention validation sets or their splits. For example, "The train-test split ratio is 0.5:0.5". |
| Hardware Specification | Yes | Our experiments are conducted in a single NVIDIA RTX 4090. |
| Software Dependencies | No | We follow ORCA (Shen et al., 2023) use the Hugging Face transformers library (Wolf et al., 2019) to implement the pretrained models. |
| Experiment Setup | Yes | For other experimental settings such as learning rates, number of epochs, optimizers, we adhere to the configurations specified by ORCA. Our experiments are conducted in a single NVIDIA RTX 4090. The specific parameter settings are shown in the Tabel 12 and Table 13. |