Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Reliable and Holistic Visual In-Context Learning Prompt Selection

Authors: Wenxiao Wu, Jing-Hao Xue, Chengming Xu, Chen Liu, Xinwei Sun, Changxin Gao, Nong Sang, Yanwei Fu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments. We give the setups, and evaluate our RH-Partial2Global on several visual tasks. 4.1 Main results. The quantitative results for the foreground segmentation, object detection and image colorization tasks are presented in Table 2. 4.2 Ablation study. In order to fully validate the effectiveness of RH-Partial2Global, we conduct a series of ablation studies on it.
Researcher Affiliation	Collaboration	1 Huazhong University of Science and Technology 2 Shanghai Innovation Institute 3 University College London 4 Tencent Youtu Lab 5 The Hong Kong University of Science and Technology 6 Fudan University
Pseudocode	Yes	Algorithm 1 Jackknife Conformal Prediction-guided Candidate Selection
Open Source Code	Yes	The source code is available in https://github.com/Wu-Wenxiao/RH-Partial2Global.
Open Datasets	Yes	For the segmentation task, We utilize the Pascal-5i [26] dataset... The Pascal VOC 2012 dataset [28] is employed for the single object detection task... For the colorization task, we sample a test set from the validation set of ILSVRC2012 [29]...
Dataset Splits	Yes	For the segmentation task, We utilize the Pascal-5i [26] dataset, which comprises four different image splits. Performance is reported using the mean Intersection over Union (m Io U) for each split, along with the average m Io U across all four splits.
Hardware Specification	No	The paper mentions models, optimizers, learning rates, batch size, but does not provide specific hardware details (e.g., GPU model, CPU model) used for computation.
Software Dependencies	No	The paper mentions using CLIP [30] and DINOv2 [31] models, and the Adam W optimizer, but does not provide specific version numbers for these or other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	Specifically, we train meta-rankers with both lengths of 5 and 10 for foreground segmentation and single object detection, while metarankers of length 3 and 5 are utilized for the colorization task. Additionally, we employ DINOv2 [31] as the feature extractor and optimize using the Adam W optimizer with a learning rate of 5 10 5 and a batch size of 64. For our conformal prediction-based selection strategy, we consistently set α = 0.85, corresponding to an 85% confidence level across all tasks, and adopt the negative KL Divergence as our conformity function.