reproducibilityindex.ai

Geodesic Multi-Modal Mixup for Robust Fine-Tuning

Authors: Changdae Oh, Junhyuk So, Hoyoon Byun, YongTaek Lim, Minchul Shin, Jong-June Jeon, Kyungwoo Song

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on retrieval, calibration, fewor zero-shot classification (under distribution shift), embedding arithmetic, and image captioning further show that our method provides transferable representations, enabling robust model adaptation on diverse tasks.
Researcher Affiliation	Academia	Changdae Oh University of Seoul Junhyuk So POSTECH Hoyoon Byun University of Seoul Yong Taek Lim University of Seoul Minchul Shin KAIST Jong-June Jeon University of Seoul Kyungwoo Song+ Yonsei University
Pseudocode	No	The paper states "pseudo code" is in Supplementary Material, but the provided text does not include the supplementary material, so it is not present in the scope of this analysis. Text: "Further details, hyperparameters selection, pseudo code, and additional results are put in Sec. A, B, and C of SM, respectively."
Open Source Code	Yes	Code: https://github.com/changdaeoh/multimodal-mixup
Open Datasets	Yes	First, we validate our method on image-text retrieval, a representative vision-language task, on Flickr30k [67] and MS COCO [70]. We consider Oxford Pets [75], SVHN [76], and CLEVR [77] for the general setting3 and Image Net-1k, Image Net V2 [78], Image Net-A [79], Image Net-R [80], and Image Net-Sketch [81] for distribution shift setting. In this section, we study whether m2-Mix can help the multi-modal representation learning for video recognition (CMU-MOSEI [83]) under modality missing.
Dataset Splits	No	The paper describes a "few-shot evaluation protocol: 16-shot training samples per class and inference on the entire test set" but does not specify a separate validation split or explicit proportions for validation data within the main text. Text: "Following [82, 64], we perform the tasks under a few-shot evaluation protocol: 16-shot training samples per class and inference on the entire test set."
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, memory). It only discusses software components and training settings.
Software Dependencies	No	The paper mentions software like "Adam optimizer", "CLIP Vi T-B/32", "Open CLIP library", "BERT [73]", and "Res Net-50 [74]", but it does not specify version numbers for these components. Text: "All methods are trained over 9 epochs with Adam optimizer (details in SM). Unless otherwise stated, we adopt CLIP Vi T-B/32 as our backbone model."
Experiment Setup	Yes	All methods are trained over 9 epochs with Adam optimizer (details in SM). Following [82, 64], we perform the tasks under a few-shot evaluation protocol: 16-shot training samples per class and inference on the entire test set. FT (τ = 0.05). For all three methods, we train the model on MS COCO over one epoch with Open CLIP-provided hyperparameter configuration.