Training Transitive and Commutative Multimodal Transformers with LoReTTa

Authors: Manuel Tran, Yashin Dicente Cid, Amal Lahiani, Fabian Theis, Tingying Peng, Eldad Klaiman

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively evaluate our approach on a synthetic, medical, and reinforcement learning dataset.
Researcher Affiliation Collaboration 1Roche Diagnostics Gmb H, 2Roche Diagnostics S.L. 3Technical University of Munich, 4Helmholtz Munich
Pseudocode Yes We also publish the pseudocode and data processing pipeline.
Open Source Code No The paper mentions publishing "pseudocode and data processing pipeline" but does not provide concrete access (e.g., a specific repository link or explicit statement of code release) for the implementation of its methodology.
Open Datasets Yes The speech dataset features about 40,000 spectrograms from Audio MNIST [31], the vision dataset comprises 70,000 images from MNIST [34], and the language dataset consists of 130,000 documents from Wine Reviews [60].
Dataset Splits No The paper describes how specific datasets were constructed or split for experimental scenarios (e.g., non-overlapping samples for bimodal datasets, or subsets for simulating missing modalities) but does not provide explicit train/validation/test percentages or counts for model training or a general splitting methodology for reproducibility.
Hardware Specification Yes We trained all of our models on a single NVIDIA A100-SXM4-40GB GPU using Py Torch 2.0.
Software Dependencies Yes We trained all of our models on a single NVIDIA A100-SXM4-40GB GPU using Py Torch 2.0.
Experiment Setup Yes For optimization, we choose the Adam W algorithm with a learning rate of 6e-4, a weight decay factor of 0.1, and a gradient clipping of 1. The learning rate undergoes a 10-fold decay using cosine annealing and a linear warm-up during the first couple hundred steps.