Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Adapting Multi-modal Large Language Model to Concept Drift From Pre-training Onwards

Authors: Xiaoyu Yang, Jie Lu, En Yu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate our method enhances the efficiency and accuracy of image-text alignment in the pre-training of VL models, particularly in the concept drift scenario. Moreover, various downstream tasks exhibit significant improvements in our model s ability to adapt to the long-tailed open world. Furthermore, we create a set of multi-modal datasets called Open MMlo, specifically tailored for the long-tailed open-world setting, to validate our findings.
Researcher Affiliation	Academia	Xiaoyu Yang, Jie Lu, En Yu Australian Artificial Intelligence Institute (AAII), Faulty of Engineering and Information Technology, University of Technology Sydney, Australia. EMAIL; EMAIL
Pseudocode	No	The paper describes its methodology in text and mathematical formulations (Section 2 Methodology, Section 2.1 MULTI-MODAL CONCEPT DRIFT THEORY, Section 2.2 T-DISTRIBUTED ADAPTER FOR CONCEPT DRIFT, Section 2.3 T-DISTRIBUTED VISION LANGUAGE MODEL FOR THE CONCEPT DRIFT) and includes a workflow diagram (Figure 2), but does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	To foster the development of the multi-modal community, we have made both Open MMlo datasets and our code publicly available at: https://github.com/Xiaoyu Young/Concept Drift MLLMs.
Open Datasets	Yes	Furthermore, we create a set of multi-modal datasets called Open MMlo, specifically tailored for the long-tailed open-world setting, to validate our findings. To foster the development of the multi-modal community, we have made both Open MMlo datasets and our code publicly available at: https://github.com/Xiaoyu Young/Concept Drift MLLMs. We extend the open-source datasets, namely Image Net-LT Liu et al. (2019), i Natualist2018 Van Horn et al. (2018) and Places-LT Liu et al. (2019).
Dataset Splits	Yes	The categories are split into three groups: many-shot (with more than 100 training samples), medium-shot (with 20-100 training samples), and few-shot (with fewer than 20 training samples). The Top-1 accuracies are computed for each group to evaluate the performance of mitigating the bias introduced by the long-tail distribution, respectively. Image Net-LT has 1,000 classes and contains 115.8k samples, with a maximum of 1,280 samples and a minimum of 5 samples for a category. Besides, it consists of 18k images for OOD detection.
Hardware Specification	Yes	the pre-training of our vision language model consists of 800,000 steps, executed on 2 × 2 NVIDIA A100 GPUs.
Software Dependencies	No	For our language-guided image tokenizer, we leverage the strengths of both BERT Devlin et al. (2019b) and Vi T as our text encoder, text decoder and visual encoder, respectively. We utilize the Adam W optimizer... The paper mentions specific software components like BERT, ViT, and AdamW optimizer, but does not provide specific version numbers for any of them or for supporting libraries like PyTorch or TensorFlow.
Experiment Setup	Yes	Table 7: The training hyperparameters of our vision language model. Pre-training: Training Steps 400,000, Warmup Steps 1,000, Optimizer Adam W, Learning Rate 1e-4, Learning Rate Decay Cosine, Adam β (0.9, 0.98), Weight Decay 0.05, Batch Size 50. Fine-tuning: Training Steps 18,000, Warmup Steps 0, Optimizer Adam W, Learning Rate 2e-5, Learning Rate Decay Cosine, Adam β (0.9, 0.98), Weight Decay 0.05, Batch Size 400.