reproducibilityindex.ai

Adversarial Distillation of Bayesian Neural Network Posteriors

Authors: Kuan-Chieh Wang, Paul Vicol, James Lucas, Li Gu, Roger Grosse, Richard Zemel

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose a framework, Adversarial Posterior Distillation, to distill the SGLD samples using a Generative Adversarial Network (GAN). At test-time, samples are generated by the GAN. We show that this distillation framework incurs no loss in performance on recent BNN applications including anomaly detection, active learning, and defense against adversarial attacks. By construction, our framework distills not only the Bayesian predictive distribution, but the posterior itself. This allows one to compute quantities such as the approximate model variance, which is useful in downstream tasks. To our knowledge, these are the ﬁrst results applying MCMC-based BNNs to the aforementioned applications.
Researcher Affiliation	Academia	1University of Toronto, Toronto, Ontario, Canada 2Vector Institute, Toronto, Ontario, Canada.
Pseudocode	Yes	Algorithm 1 Ofﬂine APD; Algorithm 2 Online APD
Open Source Code	Yes	Implementation details can be found at https:// github.com/wangkua1/apd_public
Open Datasets	Yes	We used MNIST for our classiﬁcation and anomaly detection experiments.
Dataset Splits	Yes	We trained on 50,000 examples, and reserved 10,000 from the standard training set as a ﬁxed validation set.
Hardware Specification	No	The paper does not specify any hardware components such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions using the 'foolbox library (Rauber et al., 2017)' but does not provide a specific version number for this or any other software dependency.
Experiment Setup	Yes	When training with SGD, we tuned the learning rate and weight decay on the validation set: we found the best values to be 0.05 and 0.001, respectively. (...) For SGLD, we did not use dropout, and the number of burn-in iterations and sampling interval were 500 and 20, respectively. The batch size for training was ﬁxed at 100 for all methods. (...) We experimented with two fc NN architectures: fc NN1, with architecture 784-100-10 (79,510 parameters), and fc NN2, with architecture 784-400-400-10 (478,410 parameters). For APD, we used a 3-layer fc NN with 100 hidden units per layer for both our generator and discriminator, for all tasks.