Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Free Data Selection with General-Purpose Models

Authors: Yichen Xie, Mingyu Ding, Masayoshi TOMIZUKA, Wei Zhan

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments verify the effectiveness of Free Sel on various computer vision tasks.
Researcher Affiliation Academia Yichen Xie, Mingyu Ding , Masayoshi Tomizuka, Wei Zhan UC Berkeley EMAIL
Pseudocode Yes Algorithm 1: Semantic Pattern Extraction
Open Source Code Yes Our code is available at https://github.com/yichen928/Free Sel.
Open Datasets Yes We carry out experiments on PASCAL VOC [14]. In line with prior work [1, 57], we combine the training and validation sets of PASCAL VOC 2007 and 2012 as the training data pool with 16, 551 images.
Dataset Splits No In line with prior work [1, 57], we combine the training and validation sets of PASCAL VOC 2007 and 2012 as the training data pool with 16, 551 images. The paper combines training and validation sets into a single training pool, but does not explicitly describe separate validation splits for model training or reproduction.
Hardware Specification Yes The time is estimated on a single NVIDIA TITAN RTX GPU.
Software Dependencies No The model is implemented based on mmdetection. We follow [57, 1] to train the model for 300 epochs with batch size 32 using SGD optimizer (momentum 0.9). No specific version numbers for mmdetection, PyTorch, or other libraries are provided.
Experiment Setup Yes The model is trained for 300 epochs with batch size 32 using SGD optimizer (momentum 0.9). The initial learning rate is 0.001, which decays to 0.0001 after 240 epochs.