Adversarially Robust Representations with Smooth Encoders

Authors: Taylan Cemgil, Sumedh Ghaisas, Krishnamurthy (Dj) Dvijotham, Pushmeet Kohli

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate our approach on standard datasets and experimentally show that significant improvements in the downstream adversarial accuracy can be achieved by learning robust representations completely in an unsupervised manner... We show empirically using simulation studies on MNIST, color MNIST and Celeb A datasets, that models trained using our method learn representations that provide a higher degree of adversarial robustness even without supervised adversarial training.
Researcher Affiliation Industry Deep Mind, London {taylancemgil,sumedhg,dvij,pushmeet}@google.com
Pseudocode Yes Initialize η(0), θ(0) for τ = 1, 2, . . . do xa = Get Data(), xb = Select(xa; L, ϵ) (see Section 3.1) µa, Σa = f(xa; η), µb, Σb = f(xb; η) (Compute Representation) WD2,γ(η) = Entropy Regularized Wasserstein Divergence(µa, Σa, µb, Σb, γ) (see Apdx. B.2) u N(0, I) (Reparametrization Trick) E1(η, θ) = 1 2v xa g(µa + Σ1/2 a u; θ) 2 (Data Fidelity) E2(η) = 1 2 µa 2 + µb 2 + Tr{Σa + Σb} (Prior Fidelity) E(η, θ) = E1(η, θ) + E2(η) WD2,γ(η) η(τ), θ(τ) = Optimization Step(E, η(τ 1), θ(τ 1)) end
Open Source Code No The paper does not include an explicit statement about releasing code or a link to a code repository for their method.
Open Datasets Yes We run simulations on Color MNIST, MNIST and Celeb A datasets.
Dataset Splits No The paper refers to a 'test set' but does not provide specific percentages, sample counts, or a detailed methodology for splitting the datasets into training, validation, and test sets.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or cloud instance types used for the experiments.
Software Dependencies No The paper mentions general architectures and optimizers ('standard MLP and Conv Net architectures', 'Adam optimizer') but does not list specific software dependencies with version numbers.
Experiment Setup Yes Table 2: Experiment Hyperparameters...Training Representation Dimension, dim(Z) 32, 64, 128, 256 VAE or SE Observation noise variance, v 0.25, 0.5, 1., 2. Architecture MLP, Conv NET Training Coupling Strength γ 0.01, 0.1, 1, 5, 10, 50 SE Only Selection PGD Radius ϵ 0.01, 0.1, 0.2, 0.3 Selection PGD Iteration Budget L 1, 5, 10, 20, 50...Each network (both the encoder and decoder) are randomly initialized and trained for 300K iterations.