On the Comparison between Multi-modal and Single-modal Contrastive Learning
Authors: Wei Huang, Andi Han, Yongqiang Chen, Yuan Cao, Zhiqiang Xu, Taiji Suzuki
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical experiments on both synthetic and real-world datasets further consolidate our theoretical findings. |
| Researcher Affiliation | Academia | Wei Huang RIKEN AIP wei.huang.vr@riken.jp Andi Han RIKEN AIP andi.han@riken.jp Yongqiang Chen The Chinese University of Hong Kong yqchen@cse.cuhk.edu.hk Yuan Cao The University of Hong Kong yuancao@hku.hk Zhiqiang Xu MBZUAI zhiqiang.xu@mbzuai.ac.ae Taiji Suzuki University of Tokyo & RIKEN AIP taiji@mist.i.u-tokyo.ac.jp |
| Pseudocode | No | The paper includes mathematical equations and derivations (e.g., in Sections 3.1, 3.2, and 5), but it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | We have also uploaded the code in the supplementary material. |
| Open Datasets | Yes | Synthetic experiments We conduct synthetic experiments to verify the theoretical results obtained in the previous sections. We generate samples following the theoretical setups, where we set the data dimension d = 2000, number of training samples n = 100, number of test samples ntest = 200, and the hidden size of all encoders as m = 50. [...] Real-world experiments We now extend the comparison of single-modal and multi-modal learning to realistic image data, Colored MNIST [3, 54], which is a typical benchmark studying the generalization capability under distribution shifts. |
| Dataset Splits | No | The paper specifies training and test sets but does not explicitly mention or detail a separate validation split for either the synthetic or real-world experiments. For example, it states, 'number of training samples n = 100, number of test samples ntest = 200' and describes the setup for 'training set' and 'test set' for Colored MNIST, but no 'validation set'. |
| Hardware Specification | Yes | We run all the experiments on Linux servers with NVIDIA V100 graphics cards and CUDA 11.2, completing them within one hour. |
| Software Dependencies | No | The paper states 'We implement our methods using Py Torch.' and 'CUDA 11.2'. While CUDA has a version, PyTorch does not, which is a key software dependency. Therefore, complete version information is not provided. |
| Experiment Setup | Yes | We adopt gradient descent with a learning rate of 0.01 as the optimizer to train the model by 200 epochs. In the single-modal setting, the µ is set to be [5, 0, ..., 0]T and the ξ N(0, I) for the in-distribution data, and the augmentation vector ϵ N(0, 0.01 I). For the multi-modal setting, µ = [0, 15, 0, ..., 0]T . In addition, for the OOD test data xtest = [ν , ζ ] Dtest, we set ν = [2, 0, ..., 0] and ζ N(0, I). [...] For the training set, 10% of labels will be clipped to a random class. For images with class 0 (or 1 ), they will be colored as red (or green) with a probability of 77.5%, and as another random color with a probability of 22.5%. |