Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning to Localize Objects with Noisy Labeled Instances

Authors: Xiaopeng Zhang, Yang Yang, Jiashi Feng9219-9226

AAAI 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The proposed SD-Loc Net achieves 70.9% Cor Loc and 51.3% m AP on PASCAL VOC 2007, surpassing the state-of-the-arts by a large margin. ... We perform experiments on two PASCAL VOC benchmarks: PASCAL VOC 2007 (Everingham et al. 2010) and VOC 2012 (Everingham et al. 2015), which are widely used for WSOL evaluation. ... We ﬁrst conduct comparative experiments with different conﬁgurations to reveal how each module affects the localization performance. The ablation experiments are performed on PASCAL VOC 2007 with model M, and the results are shown in Table 1.
Researcher Affiliation	Academia	Xiaopeng Zhang,1 Yang Yang,2 Jiashi Feng1 1National University of Singapore 2Center for Future Media and School of Computer Science and Engineering, University of Electronic Science and Technology of China EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 SD-Loc Net for WSOL Input: Training set xi X with image-level labels yi Y, training epoch T, latent variable update starting epoch T0, and number of clusters K;
Open Source Code	No	The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper.
Open Datasets	Yes	We perform experiments on two PASCAL VOC benchmarks: PASCAL VOC 2007 (Everingham et al. 2010) and VOC 2012 (Everingham et al. 2015), which are widely used for WSOL evaluation. PASCAL VOC 2007 contains 9,963 images spanning 20 object classes, with 5,011 images used for trainval and the rest 4,952 for test. PASCAL VOC 2012 contains 11,540 images for trainval and 10,991 for test. We use trainval split for training and test split for test. ... To further validate the effectiveness of SD-Loc Net, we evaluate it on a much larger dataset MS COCO 2014 (Lin et al. 2014) with over 135k images spanning 80 categories, of which around 80k images are used for train and around 40k for val.
Dataset Splits	Yes	PASCAL VOC 2007 contains 9,963 images spanning 20 object classes, with 5,011 images used for trainval and the rest 4,952 for test. PASCAL VOC 2012 contains 11,540 images for trainval and 10,991 for test. We use trainval split for training and test split for test. ... MS COCO 2014 (Lin et al. 2014) with over 135k images spanning 80 categories, of which around 80k images are used for train and around 40k for val. We choose the train split for training and the val split for test.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	During network training, each SGD is constructed from N = 1 images with mini-batch size Rb = 256, where Rp = 100 proposals are ﬁxed and the others are randomly sampled. ... The parameters λ control the relative contributions of each component. By default, we set λ = 1e 4, and thus all loss terms are roughly equally weighted.