Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension
Authors: Amaia Cardiel, Eloi Zablocki, Elias Ramzi, Oriane Simรฉoni, MATTHIEU CORD
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate LLM-wrapper on multiple datasets using different VLMs and LLMs, demonstrating significant performance improvements and highlighting the versatility of our method. |
| Researcher Affiliation | Collaboration | Amaia Cardiel1,2, Eloi Zablocki1, Elias Ramzi1, Oriane Sim eoni1, Matthieu Cord1,3 1 Valeo.ai 2 APTIKAL, LIG, Universit e Grenoble Alpes 3 Sorbonne Universit e EMAIL |
| Pseudocode | No | The paper describes the method using natural language and figures, but does not contain a dedicated pseudocode block or algorithm section. |
| Open Source Code | Yes | The code and the checkpoints are available at https://github.com/valeoai/LLM_wrapper. |
| Open Datasets | Yes | We experiment with LLM-wrapper on three classic REC datasets Ref COCO, Ref COCO+ (Kazemzadeh et al., 2014), Ref COCOg (Mao et al., 2016) and on Talk2Car (Deruyttere et al., 2019), Additionaly, we evaluate LLM-wrapper on the recent and challenging HC-Ref Lo Co (Wei et al., 2024) benchmark. |
| Dataset Splits | Yes | Dataset statistics are given in Table 2: Dataset statistics. Split Size # words / query Ref COCO unc 120,624 10,834 10,752 3.5 Ref COCO+ unc 120,191 10,758 10,615 3.5 Ref COCOg umd 80,512 4,896 9,602 8.3 Talk2Car 8,348 1,163 2,447 11.0 HC-Ref Lo Co 13,360 31,378 84.6 |
| Hardware Specification | Yes | This approach makes the training efficient in terms of compute and very simple to implement in practice. ... trainable on a single 40GB-A100 GPU in less than 7 hours. |
| Software Dependencies | No | The paper mentions methods and tools like LoRA, Flash Attention, 4-bit quantization, Adam, and Hugging Face's supervised fine-tuning pipeline, but does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | We train LLM-wrapper with Adam (Kingma, 2014), with a batch-size of four, until convergence. ... Unless stated otherwise, we use a learning rate of 10 5 and a rank of r = 128 for Lo RA. |