Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
MAI: A Multi-turn Aggregation-Iteration Model for Composed Image Retrieval
Authors: Yanzhe Chen, Zhiwen Yang, Jinglin Xu, Yuxin Peng
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that the proposed MAI model achieves substantial improvements over state-of-the-art methods. |
| Researcher Affiliation | Academia | 1Wangxuan Institute of Computer Technology, Peking University 2School of Intelligence Science and Technology, University of Science and Technology Beijing EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the Multi-turn Iterative Optimization (MIO) mechanism using mathematical formulations (Eq. 1, 2, 3) and descriptive text, but it does not present it in a structured pseudocode or algorithm block. |
| Open Source Code | Yes | The dataset and source code are available at https://github.com/PKU-ICST-MIPL/MAI_ICLR2025. |
| Open Datasets | Yes | The dataset and source code are available at https://github.com/PKU-ICST-MIPL/MAI_ICLR2025. |
| Dataset Splits | No | The paper does not explicitly provide specific training/test/validation dataset splits (e.g., percentages or exact counts) for the experiments. It mentions using "20% of the data from each dataset for manual scoring" for quality assessment, but not for model training or evaluation. |
| Hardware Specification | Yes | All model training and inference are conducted on 8 V100 GPUs. |
| Software Dependencies | Yes | We adopt BLIP-2 Li et al. (2023) with the Flan-t5-xxl language model Chung et al. (2024) for image captioning and Xwin-13B-V0.2 Ni et al. (2024) as the LLM. Optimization is performed using Adam W Loshchilov & Hutter (2019) |
| Experiment Setup | Yes | Optimization is performed using Adam W Loshchilov & Hutter (2019) with a batch size of 16, an initial learning rate of 1e-5, and cosine annealing. Training runs for 50 epochs, while inference uses a batch size of 2048. All model training and inference are conducted on 8 V100 GPUs. The number of learned tokens is fixed at 32, and 32 tokens are retained each turn through the MIO. |