Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Contour-based Interactive Segmentation

Authors: Polina Popenova, Danil Galeev, Anna Vorontsova, Anton Konushin

IJCAI 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the proposed method on the standard segmentation benchmarks, our novel User Contours dataset, and its subset User Contours-G containing difficult segmentation cases. Through experiments, we demonstrate that a single contour provides the same accuracy as multiple clicks, thus reducing the required amount of user interactions.
Researcher Affiliation	Industry	Polina Popenova , Danil Galeev , Anna Vorontsova and Anton Konushin Samsung Research EMAIL
Pseudocode	No	The paper describes the 'Contour Generation' algorithm in numbered steps, but does not present it as a formal pseudocode or algorithm block.
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for the methodology described.
Open Datasets	Yes	Specifically, we use Semantic Boundaries Dataset, or SBD [Hariharan et al., 2011], and the combination of LVIS [Gupta et al., 2019] and COCO [Lin et al., 2014] for training.
Dataset Splits	No	The paper mentions using a 'test+validation split of Open Images [Kuznetsova et al., 2020] (about 100k samples)' in an ablation study but does not provide specific details on how this or other datasets are split into train/validation/test sets for reproducibility.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions using 'Adam' for optimization but does not provide specific version numbers for any software dependencies (e.g., libraries, frameworks).
Experiment Setup	Yes	Input images are resized to 320px 480px. During training, we randomly crop and rescale images, use horizontal flip, and apply random jittering of brightness, contrast, and RGB values. The models are trained for 140 epochs using Adam [Kingma and Ba, 2014] with β1 = 0.9, β2 = 0.999 and ε = 10 8. The learning rate is initialized with 5 10 4 and reduced by a factor of 10 at epochs 119 and 133. In a study of training data, we fine-tune our models for 10 epochs. We use stochastic weight averaging [Izmailov et al., 2018], aggregating the weights at every second epoch starting from the fourth epoch. During fine-tuning, we set the learning rate to 1 10 5 for the backbone and 1 10 4 for the rest of the network, and reduce it by a factor of 10 at epochs 8 and 9.