Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Quantifying Cross-Modality Memorization in Vision-Language Models

Authors: Yuxin Wen, Yangsibo Huang, Tom Goldstein, Ravi Kumar, Badih Ghazi, Chiyuan Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We quantify factual knowledge memorization and cross-modal transferability by training models on a single modality and evaluating their performance in the other. Our results reveal that facts learned in one modality transfer to the other, but a significant gap exists between recalling information in the source and target modalities.
Researcher Affiliation	Collaboration	Yuxin Wen1 , Yangsibo Huang2, Tom Goldstein1, Ravi Kumar2, Badih Ghazi2, Chiyuan Zhang2 1University of Maryland, College Park 2Google
Pseudocode	No	No pseudocode or algorithm blocks were found in the paper. The paper describes methodologies in paragraph form and uses diagrams to illustrate concepts.
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: Instead of providing a copy of the data, we provide full details to reproduce the experiments, including instructions on how to generate synthetic datasets used in the experiments.
Open Datasets	Yes	we introduce incorporate images and captions from the COCO dataset [Lin et al., 2014].
Dataset Splits	Yes	The resulting synthetic persona dataset consists of a collection of 100 unique personas. Each persona is characterized by the following elements: A set of 100 image variants for training and 1 distinct image for testing. A set of 100 textual description variants for training and 1 distinct textual description for testing.
Hardware Specification	Yes	All training is performed on a single Nvidia A100-80G GPU.
Software Dependencies	No	The paper mentions fine-tuning 'Gemma-3-4b' and using 'LoRA' and 'AdamW', which are models and algorithms, respectively. However, it does not provide specific version numbers for software dependencies like programming languages, libraries (e.g., PyTorch, TensorFlow), or other frameworks used in the implementation.
Experiment Setup	Yes	During fine-tuning, we utilize Lo RA [Hu et al., 2022] with a rank of r = 32, a scaling factor of α = 32, and a dropout probability of 0.05. We use Adam W [Loshchilov and Hutter, 2017] with a learning rate of 2 10 4 and a batch size of 16.