Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Free Data Selection with General-Purpose Models

Authors: Yichen Xie, Mingyu Ding, Masayoshi TOMIZUKA, Wei Zhan

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments verify the effectiveness of Free Sel on various computer vision tasks.
Researcher Affiliation	Academia	Yichen Xie, Mingyu Ding , Masayoshi Tomizuka, Wei Zhan UC Berkeley EMAIL
Pseudocode	Yes	Algorithm 1: Semantic Pattern Extraction
Open Source Code	Yes	Our code is available at https://github.com/yichen928/Free Sel.
Open Datasets	Yes	We carry out experiments on PASCAL VOC [14]. In line with prior work [1, 57], we combine the training and validation sets of PASCAL VOC 2007 and 2012 as the training data pool with 16, 551 images.
Dataset Splits	No	In line with prior work [1, 57], we combine the training and validation sets of PASCAL VOC 2007 and 2012 as the training data pool with 16, 551 images. The paper combines training and validation sets into a single training pool, but does not explicitly describe separate validation splits for model training or reproduction.
Hardware Specification	Yes	The time is estimated on a single NVIDIA TITAN RTX GPU.
Software Dependencies	No	The model is implemented based on mmdetection. We follow [57, 1] to train the model for 300 epochs with batch size 32 using SGD optimizer (momentum 0.9). No specific version numbers for mmdetection, PyTorch, or other libraries are provided.
Experiment Setup	Yes	The model is trained for 300 epochs with batch size 32 using SGD optimizer (momentum 0.9). The initial learning rate is 0.001, which decays to 0.0001 after 240 epochs.