Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Exploratory Inference Learning for Scribble Supervised Semantic Segmentation

Authors: Chuanwei Zhou, Zhen Cui, Chunyan Xu, Cao Han, Jian Yang

AAAI 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive evaluations on the benchmark datasets (PASCAL VOC 2012 and PASCAL Context) demonstrate the superiority of our proposed EIL when compared with other state-of-the-art methods for the scribble-supervised semantic segmentation problem.
Researcher Affiliation	Academia	Chunawei Zhou, Zhen Cui, Chunyan Xu, Cao Han, Jian Yang PCA Lab, Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, Jiangsu Key Lab of Image and Video Understanding for Social Security, School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China. EMAIL
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code will be available at our site1. 1https://vgg-ai.cn/resources/
Open Datasets	Yes	The PASCAL VOC 2012 dataset contains 20 foreground classes in total as well as a background class. [...] The experiments are also conducted on the PASCAL Context dataset (Mottaghi et al. 2014), which has 59 semantic classes as well as a background class. [...] All the scribbles for the model training are from (Lin et al. 2016).
Dataset Splits	Yes	The original segmentation training subset is composed of 1464 images which is then extended by (Hariharan et al. 2011) to a full set that includes a total of 10582 images. We utilize the full set to train the main framework, and the ablation studies are conducted on both the subset and the full set. The validation dataset is composed of 1449 fully annotated image samples. The experiments are also conducted on the PASCAL Context dataset (Mottaghi et al. 2014), which has 59 semantic classes as well as a background class. The PASCAL Context dataset is composed of 4998 training images along with 5105 validation images.
Hardware Specification	Yes	All the experiments are conducted with the Py Torch framework (Paszke et al. 2019) in an RTX Titan GPU.
Software Dependencies	No	The paper mentions "Py Torch framework (Paszke et al. 2019)" but does not specify a version number for PyTorch or any other software libraries.
Experiment Setup	Yes	The SGD optimizer with momentum and weight decay being 0.9 and 5e-4 is adopted to train the segmenter Φ, and its learning rate is initially set 1e-4 and then slowly decayed with a poly schedule. We utilize two SGD optimizers whose momentum and weight decay equal 0.9 and 0.02 to train the exploratory operators, and the learning rates are initialized 1e-3 and decayed with a cos schedule. Random augmentations including scaling ([0.5, 2.0]), ﬂipping (p = 0.5), rotation ([ 10, 10]) and cropping (512 512) are adopted in the training stages. [...] where γ is a discount ratio, and it is set to 0.9 in this work. [...] where δ is a hyper-parameter to enlarge its separation from the class centers, and we set δ = 10.0 in this work.