Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Training Transitive and Commutative Multimodal Transformers with LoReTTa
Authors: Manuel Tran, Yashin Dicente Cid, Amal Lahiani, Fabian Theis, Tingying Peng, Eldad Klaiman
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively evaluate our approach on a synthetic, medical, and reinforcement learning dataset. |
| Researcher Affiliation | Collaboration | 1Roche Diagnostics Gmb H, 2Roche Diagnostics S.L. 3Technical University of Munich, 4Helmholtz Munich |
| Pseudocode | Yes | We also publish the pseudocode and data processing pipeline. |
| Open Source Code | No | The paper mentions publishing "pseudocode and data processing pipeline" but does not provide concrete access (e.g., a specific repository link or explicit statement of code release) for the implementation of its methodology. |
| Open Datasets | Yes | The speech dataset features about 40,000 spectrograms from Audio MNIST [31], the vision dataset comprises 70,000 images from MNIST [34], and the language dataset consists of 130,000 documents from Wine Reviews [60]. |
| Dataset Splits | No | The paper describes how specific datasets were constructed or split for experimental scenarios (e.g., non-overlapping samples for bimodal datasets, or subsets for simulating missing modalities) but does not provide explicit train/validation/test percentages or counts for model training or a general splitting methodology for reproducibility. |
| Hardware Specification | Yes | We trained all of our models on a single NVIDIA A100-SXM4-40GB GPU using Py Torch 2.0. |
| Software Dependencies | Yes | We trained all of our models on a single NVIDIA A100-SXM4-40GB GPU using Py Torch 2.0. |
| Experiment Setup | Yes | For optimization, we choose the Adam W algorithm with a learning rate of 6e-4, a weight decay factor of 0.1, and a gradient clipping of 1. The learning rate undergoes a 10-fold decay using cosine annealing and a linear warm-up during the first couple hundred steps. |