Doubly-Robust Self-Training
Authors: Banghua Zhu, Mingyu Ding, Philip Jacobson, Ming Wu, Wei Zhan, Michael Jordan, Jiantao Jiao
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through empirical evaluations on both the ImageNet dataset for image classification and the nuScenes autonomous driving dataset for 3D object detection, we demonstrate the superiority of the doubly robust loss over the standard self-training baseline. |
| Researcher Affiliation | Academia | Banghua Zhu Department of EECS UC Berkeley banghua@berkeley.edu Mingyu Ding Department of EECS UC Berkeley myding@berkeley.edu Philip Jacobson Department of EECS UC Berkeley philip_jacobson@berkeley.edu Ming Wu Department of EECS UC Berkeley mingwu@berkeley.edu Wei Zhan Department of EECS UC Berkeley wzhan@berkeley.edu Michael I. Jordan Department of EECS UC Berkeley jordan@berkeley.edu Jiantao Jiao Department of EECS UC Berkeley jiantao@berkeley.edu |
| Pseudocode | No | The paper provides mathematical formulations of loss functions but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available in https://github.com/dingmyu/Doubly-Robust-Self-Training. |
| Open Datasets | Yes | We evaluate our doubly robust self-training method on the ImageNet100 dataset, which contains a random subset of 100 classes from ImageNet-1k (Russakovsky et al., 2015)... We conduct experiments on both image classification task with ImageNet dataset... and 3D object detection task with autonomous driving dataset nuScenes (Caesar et al., 2020). |
| Dataset Splits | Yes | We first conduct experiments on ImageNet100 by training the model for 20 epochs using different fractions of labeled data from 1% to 100%. ... We report results for training with 1/24, 1/16, and 1/4 of the total labels in Table 2. |
| Hardware Specification | Yes | We train all the models with a batch size of 1024 on 8 Tesla V100 GPUs... The teacher pre-training and student training are both conducted for 10 epochs on 3 NVIDIA RTX A6000 GPUs. |
| Software Dependencies | No | The paper mentions using "Adam W" optimizer and a "triangular learning rate schedule" but does not specify software names with version numbers for reproducibility (e.g., PyTorch 1.9, TensorFlow 2.x). |
| Experiment Setup | Yes | We train all the models with a batch size of 1024... All models are trained for 20 epochs... The weight decay is set to 0.05 and the maximal gradient norm is clipped to 1.0. The stochastic depth drop rates are set to 0.1 for all models. |