Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Chain-of-region: Visual Language Models Need Details for Diagram Analysis

Authors: Xue Li, Yiyou Sun, Wei Cheng, Yinglun Zhu, Haifeng Chen

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted extensive experiments on the Massive Multi-discipline Multimodal (MMMU, Yue et al. (2024a)) dataset, which includes a diverse array of multimodal questions sourced from college exams, quizzes, and textbooks. We validate our approach through a series of experiments that demonstrate enhanced performance in diagram analysis tasks, setting a new standard for integrating visual and language processing in a multimodal context.
Researcher Affiliation	Collaboration	1University of Wisconsin-Madison, 2University of California, Berkeley, 3NEC Laboratories America 4University of California, Riverside
Pseudocode	Yes	1 _, X_bi = cv2.threshold(X_im, 0, 1, cv2.THRESH_OTSU) 2 _, X_fg = cv2.connected Components(X_bi) 3 _, X_bg = cv2.connected Components(1 X_bi) 4 X_region = X_fg + X_bg + (X_bg > 0) * offset
Open Source Code	No	The paper discusses the use of the Open CV library and refers to third-party tools like PaddlePaddle. It mentions "our own implementations for detecting rectangular shapes" but does not provide any statement or link indicating the release of the code for their described methodology.
Open Datasets	Yes	We conducted extensive experiments on the Massive Multi-discipline Multimodal (MMMU, Yue et al. (2024a)) dataset, which includes a diverse array of multimodal questions sourced from college exams, quizzes, and textbooks.
Dataset Splits	No	The paper mentions using a tailored subset of the MMMU dataset comprising 5,210 images and constructing a custom dataset of 100+ samples for segmentation evaluation. However, it does not provide specific training, validation, or test splits for either dataset.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models or other computer specifications used for running the experiments. It only mentions that the Co R framework itself "requires only CPU processing".
Software Dependencies	Yes	Specifically, we utilize gpt-4-turbo, chatgpt-4o-latest, and gpt-4o-mini-2024-07-18 respectively.
Experiment Setup	Yes	The primary hyperparameters employed in our Chain-of-Region method include the pre-defined recognition call limits in split step and the cluster number during the unstructured merge step, detailed in Sections 3.1.2 and 3.1.3. In the current implementation, we have set the recognition call limit to 10 and the cluster number to 5.