Doubly-Robust Self-Training

Authors: Banghua Zhu, Mingyu Ding, Philip Jacobson, Ming Wu, Wei Zhan, Michael Jordan, Jiantao Jiao

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through empirical evaluations on both the ImageNet dataset for image classification and the nuScenes autonomous driving dataset for 3D object detection, we demonstrate the superiority of the doubly robust loss over the standard self-training baseline.
Researcher Affiliation Academia Banghua Zhu Department of EECS UC Berkeley banghua@berkeley.edu Mingyu Ding Department of EECS UC Berkeley myding@berkeley.edu Philip Jacobson Department of EECS UC Berkeley philip_jacobson@berkeley.edu Ming Wu Department of EECS UC Berkeley mingwu@berkeley.edu Wei Zhan Department of EECS UC Berkeley wzhan@berkeley.edu Michael I. Jordan Department of EECS UC Berkeley jordan@berkeley.edu Jiantao Jiao Department of EECS UC Berkeley jiantao@berkeley.edu
Pseudocode No The paper provides mathematical formulations of loss functions but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes The code is available in https://github.com/dingmyu/Doubly-Robust-Self-Training.
Open Datasets Yes We evaluate our doubly robust self-training method on the ImageNet100 dataset, which contains a random subset of 100 classes from ImageNet-1k (Russakovsky et al., 2015)... We conduct experiments on both image classification task with ImageNet dataset... and 3D object detection task with autonomous driving dataset nuScenes (Caesar et al., 2020).
Dataset Splits Yes We first conduct experiments on ImageNet100 by training the model for 20 epochs using different fractions of labeled data from 1% to 100%. ... We report results for training with 1/24, 1/16, and 1/4 of the total labels in Table 2.
Hardware Specification Yes We train all the models with a batch size of 1024 on 8 Tesla V100 GPUs... The teacher pre-training and student training are both conducted for 10 epochs on 3 NVIDIA RTX A6000 GPUs.
Software Dependencies No The paper mentions using "Adam W" optimizer and a "triangular learning rate schedule" but does not specify software names with version numbers for reproducibility (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup Yes We train all the models with a batch size of 1024... All models are trained for 20 epochs... The weight decay is set to 0.05 and the maximal gradient norm is clipped to 1.0. The stochastic depth drop rates are set to 0.1 for all models.