Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension

Authors: Amaia Cardiel, Eloi Zablocki, Elias Ramzi, Oriane Siméoni, MATTHIEU CORD

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate LLM-wrapper on multiple datasets using different VLMs and LLMs, demonstrating significant performance improvements and highlighting the versatility of our method.
Researcher Affiliation	Collaboration	Amaia Cardiel1,2, Eloi Zablocki1, Elias Ramzi1, Oriane Sim eoni1, Matthieu Cord1,3 1 Valeo.ai 2 APTIKAL, LIG, Universit e Grenoble Alpes 3 Sorbonne Universit e EMAIL
Pseudocode	No	The paper describes the method using natural language and figures, but does not contain a dedicated pseudocode block or algorithm section.
Open Source Code	Yes	The code and the checkpoints are available at https://github.com/valeoai/LLM_wrapper.
Open Datasets	Yes	We experiment with LLM-wrapper on three classic REC datasets Ref COCO, Ref COCO+ (Kazemzadeh et al., 2014), Ref COCOg (Mao et al., 2016) and on Talk2Car (Deruyttere et al., 2019), Additionaly, we evaluate LLM-wrapper on the recent and challenging HC-Ref Lo Co (Wei et al., 2024) benchmark.
Dataset Splits	Yes	Dataset statistics are given in Table 2: Dataset statistics. Split Size # words / query Ref COCO unc 120,624 10,834 10,752 3.5 Ref COCO+ unc 120,191 10,758 10,615 3.5 Ref COCOg umd 80,512 4,896 9,602 8.3 Talk2Car 8,348 1,163 2,447 11.0 HC-Ref Lo Co 13,360 31,378 84.6
Hardware Specification	Yes	This approach makes the training efficient in terms of compute and very simple to implement in practice. ... trainable on a single 40GB-A100 GPU in less than 7 hours.
Software Dependencies	No	The paper mentions methods and tools like LoRA, Flash Attention, 4-bit quantization, Adam, and Hugging Face's supervised fine-tuning pipeline, but does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	We train LLM-wrapper with Adam (Kingma, 2014), with a batch-size of four, until convergence. ... Unless stated otherwise, we use a learning rate of 10 5 and a rank of r = 128 for Lo RA.