Distributionally Robust Ensemble of Lottery Tickets Towards Calibrated Sparse Network Training

Authors: Hitesh Sapkota, Dingrong Wang, Zhiqiang Tao, Qi Yu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results on several benchmarks show that our proposed lottery ticket ensemble leads to a clear calibration improvement without sacrificing accuracy and burdening inference costs.
Researcher Affiliation Academia Hitesh Sapkota Dingrong Wang Zhiqiang Tao Qi Yu Rochester Institute of Technology {hxs1943, dw7445, zhiqiang.tao, qi.yu}@rit.edu
Pseudocode No The paper does not include clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes For the source code of this paper, please click here.
Open Datasets Yes For the general classification setting, we consider three real-world datasets: Cifar10, Cifar100 [12], and Tiny Image Net [14]. For the out-of-distribution setting, we consider the corrupted version of the Cifar10 and Cifar100 datasets which are named Cifar10-C and Cifar100-C [10]. For open-set detection, we use the SVHN dataset [21] as the open-set dataset...
Dataset Splits Yes Cifar10 consists of total 10 classes, each consisting of 5,000 training samples and 1,000 testing (evaluation) samples... Cifar100 consists of ... 500 training samples and 100 testing samples... Tiny Image Net consists of 200 classes with 1,000,000 samples where each class has 500 training images, 50 validation images, and 50 test images.
Hardware Specification Yes All experimentations are conducted using NVIDIA RTX A6000 GPU with 48GB memory requiring 300 Watt power. For GPU, CUDA Version: 11.6, Driver Version: 510.108.03, and NVIDIA-SMI: 510.108.03 is used. In terms of CPU, our experimentation uses an Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz with a 64-bit system and an x86_64 architecture.
Software Dependencies Yes CUDA Version: 11.6, Driver Version: 510.108.03, and NVIDIA-SMI: 510.108.03 is used.
Experiment Setup Yes In all experiments, we use a family of Res Net architectures with two density levels: 9% and 15%. ... All experiments are conducted with the 200 total epochs with an initial learning rate of 0.1 and a cosine scheduler function to decay the learning rate over time. ... For this, we choose λ = 10 for the second sparse sub-network and λ = 500 for the third sparse sub-network. ... For this, we choose λ = 50 for the second sparse sub-network and λ = 500 for the third one. In the case of Tiny Image Net, ... we choose λ = 100 for the second sparse sub-network and λ = 1,000,000 for the third sparse sub-network.