Drop-Bottleneck: Learning Discrete Compressed Representation for Noise-Robust Exploration

Authors: Jaekyeom Kim, Minjung Kim, Dongyeon Woo, Gunhee Kim

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose an exploration method based on Drop-Bottleneck for reinforcement learning tasks. In a multitude of noisy and reward sparse maze navigation tasks in Viz Doom (Kempka et al., 2016) and DMLab (Beattie et al., 2016), our exploration method achieves state-of-the-art performance. As a new IB framework, we demonstrate that Drop-Bottleneck outperforms Variational Information Bottleneck (VIB) (Alemi et al., 2017) in multiple aspects including adversarial robustness and dimensionality reduction. ... We carry out three types of experiments to evaluate Drop-Bottleneck (DB) in multiple aspects.
Researcher Affiliation Academia Jaekyeom Kim, Minjung Kim, Dongyeon Woo & Gunhee Kim Department of Computer Science and Engineering Seoul National University, Seoul, Republic of Korea jaekyeom@snu.ac.kr,minjung.kim@vl.snu.ac.kr,{woody0325,gunhee}@snu.ac.kr
Pseudocode No (The paper does not contain any structured pseudocode or algorithm blocks.)
Open Source Code No (The paper does not contain an explicit statement that the authors' own source code for the described methodology is released, nor a direct link to a repository containing it. The link "http://vision.snu.ac.kr/projects/db" is a project webpage, not a code repository. The paper mentions using "the official source code1 of Savinov et al. (2019)" for baselines, not their own.)
Open Datasets Yes In a multitude of noisy and reward sparse maze navigation tasks in Viz Doom (Kempka et al., 2016) and DMLab (Beattie et al., 2016), our exploration method achieves state-of-the-art performance. ... Additionally, we empirically compare with VIB to show Drop-Bottleneck s superior robustness to adversarial examples and ability to reduce feature dimensionality for inference with Image Net dataset (Russakovsky et al., 2015). ... We employ the Occluded CIFAR dataset using the experimental settings from Achille & Soatto (2018). The Occluded CIFAR dataset is created by occluding CIFAR-10 (Krizhevsky, 2009) images with MNIST (Le Cun et al., 2010) images as shown in Figure 6a, and each image has two labels of CIFAR and MNIST.
Dataset Splits No (The paper mentions using "Image Net validation set" but does not provide explicit training, validation, and test dataset splits with percentages or sample counts for all datasets used.)
Hardware Specification No (The paper does not provide specific details on the hardware used for running experiments, such as GPU models, CPU specifications, or cloud computing instances.)
Software Dependencies No (The paper mentions various software components and algorithms like PPO, Deep Infomax, and Adam optimizer, but does not specify their version numbers.)
Experiment Setup Yes For the feature extractor fφ, we use the same CNN with the policy network of PPO from Mnih et al. (2015). The only modification is to use d = 128 i.e. 128-dimensional features instead of 512 to make features lightweight enough to be stored in the episodic memory. The Deep Infomax discriminator Tψ consists of three FC hidden layers with 64, 32, 16 units each, followed by a final FC layer for prediction. We initialize the drop probability p with pi = σ(p i) and p i Uniform(a, b) where a = 2, b = 1. We collect samples and update p, Tψ, fφ with Equation (14) every 10.8K and 21.6K time steps in Viz Doom and DMLab, respectively, with a batch size of 512. ... Table 6 and Table 7 report the hyperparameters of the methods for Viz Doom and DMLab experiments, respectively.