Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Efficient Multimodal Dataset Distillation via Generative Models

Authors: Zhenghao Zhao, Haoxuan Wang, Junyi Wu, Yuzhang Shang, Gaowen Liu, Yan Yan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our method is evaluated on Flickr30K, COCO, and CC3M datasets, demonstrating superior performance and efficiency compared to existing approaches.
Researcher Affiliation	Collaboration	1University of Illinois Chicago 2University of Central Florida 3Cisco Research
Pseudocode	No	The paper describes the workflow of EDGE in Section 3.2 and Figure 2, and details the mathematical formulations of the losses (e.g., Equation 1, 3, 4, 5, 6). However, it does not include a distinct section or figure explicitly labeled "Pseudocode" or "Algorithm", nor does it present structured steps in a code-like format.
Open Source Code	Yes	Our code will be made public at https://github.com/ichbill/EDGE.
Open Datasets	Yes	We evaluate our methods on multiple vision language datasets, including Flickr30K [33], COCO [22], and Conceptual Captions 3 Million (CC3M) [42].
Dataset Splits	No	The paper evaluates on Flickr30K, COCO, and CC3M datasets, stating that 'Following previous methods [55, 56], we evaluate our method on Flickr30k [33] and COCO [22] datasets for a fair comparison with existing methods.' While it details the size of the distilled datasets (e.g., 'The distilled dataset contains 500 and 1000 image-text pairs, representing 1.7% and 3.4% of the original dataset'), it does not explicitly provide the train/validation/test splits for the original Flickr30K, COCO, or CC3M datasets, nor does it provide a direct citation or reference for the exact splits used.
Hardware Specification	Yes	The experiments are conducted on NVIDIA RTX A5000 GPUs.
Software Dependencies	No	The paper mentions using NFNet and BERT-base as image and text encoders, respectively, and refers to fine-tuning a diffusion model. However, it does not explicitly list specific software dependencies such as programming languages, libraries, or frameworks with their respective version numbers (e.g., Python version, PyTorch/TensorFlow version, CUDA version).
Experiment Setup	Yes	The weight λ in Equation 6 is set to 1. For Flickr30K dataset, the learning rate is set to 1e-4, with a batch size of 8 and a total of 16 training epochs. For COCO dataset, the learning rate is set to 1e-4, with a batch size of 8 and a total of 8 training epochs. τ is the temperature parameter, which is set to 0.5 in experiments.