Variational Information Pursuit for Interpretable Predictions

Authors: Aditya Chattopadhyay, Kwan Ho Ryan Chan, Benjamin David Haeffele, Donald Geman, Rene Vidal

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, V-IP is 10-100x faster than IP on different Vision and NLP tasks with competitive performance. In this section, through extensive experiments, we evaluate the effectiveness of the proposed method.
Researcher Affiliation Academia Aditya Chattopadhyay, Kwan Ho Ryan Chan, Benjamin D. Haeffele, Donald Geman Johns Hopkins University, USA, {achatto1,kchan49,bhaeffele,geman}@jhu.edu Ren e Vidal University of Pennsylvania, USA, vidalr@seas.upenn.edu
Pseudocode No No structured pseudocode or algorithm blocks with explicit labels were found.
Open Source Code Yes Code is available at https://github.com/ryanchankh/Variational Information Pursuit.
Open Datasets Yes CUB-200 (Wah et al., 2011) ... Huffington News (Misra, 2018) ... MNIST (Le Cun et al., 1998), KMNIST (Clanuwat et al., 2018), Fashion-MNIST (Xiao et al., 2017b), CIFAR-{10,100} (Krizhevsky et al., 2009) ... Sym CAT-200 (Peng et al., 2018) ... Mu Zhi (Wei et al., 2018) ... Dxy (Xu et al., 2019).
Dataset Splits No No explicit validation set split percentages or counts were provided. The paper mentions '60,000 training images and 10,000 testing images' for MNIST datasets and '50,000 training images and 10,000 testing images' for CIFAR, but does not specify a validation split from the training data.
Hardware Specification Yes All of our experiments are implemented in python using Py Torch (Paszke et al., 2019) version 1.12. Moreover, all training is done on one computing node with 64-core 2.10GHz Intel(R) Xeon(R) Gold 6130 CPU, 8 NVIDIA Ge Force RTX 2080 GPUs (each with 10GB memory) and 377GB of RAM.
Software Dependencies Yes All of our experiments are implemented in python using Py Torch (Paszke et al., 2019) version 1.12.
Experiment Setup Yes General Optimization Scheme. The following optimization scheme is used in all experiments for both Initial Random Sampling and Subsequent Biased Sampling, unless stated otherwise. We minimize the Deep V-IP objective using Adam (Kingma & Ba, 2014) as our optimizer, with learning rate lr=1e-4, betas=(0.9, 0.999), weight decay=0 and amdgrad=True (Reddi et al., 2019). We also use Cosine Annealing learning rate scheduler (Loshchilov & Hutter, 2016) with T max=50. We train our networks fθ and gη for 500 epochs using batch size 128. In both sampling stages, we linearly anneal temperature parameter τ, in our straight-through softmax estimator, from 1.0 to 0.2 over the 500 epochs.