Identifiability Results for Multimodal Contrastive Learning
Authors: Imant Daunhawer, Alice Bizeul, Emanuele Palumbo, Alexander Marx, Julia E Vogt
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically verify our identifiability results with numerical simulations and corroborate our findings on a complex multimodal dataset of image/text pairs. The goal of our experiments is to test whether contrastive learning can block-identify content in the multimodal setting, as described by Theorem 1. |
| Researcher Affiliation | Academia | Imant Daunhawer1, , Alice Bizeul1,2, Emanuele Palumbo1,2, Alexander Marx1,2, & Julia E. Vogt1, 1 Department of Computer Science, ETH Zurich 2 ETH AI Center, ETH Zurich |
| Pseudocode | No | The paper describes theoretical models and experimental setups but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is provided in our github repository.5 |
| Open Datasets | No | The paper describes extending the Causal3DIdent and CLEVR datasets to create Multimodal3DIdent, and provides code to generate it. However, it does not provide a direct link, DOI, or explicit statement that the specific generated instance of the Multimodal3DIdent dataset used in their experiments is publicly available for download. |
| Dataset Splits | Yes | Table 2b: # Samples (train / val / test) 125,000 / 10,000 / 10,000 |
| Hardware Specification | No | The paper mentions 'Experiments were performed on the ETH Zurich Leonhard cluster' but does not provide specific details such as GPU/CPU models, processor types, or memory specifications. |
| Software Dependencies | No | The paper mentions software like Blender and the Adam optimizer, and uses architectures like ResNet-18, but it does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | Table 2a: Parameters used for the numerical simulation. Table 2b: Parameters used for Multimodal3DIdent. |