Curiosity-Bottleneck: Exploration By Distilling Task-Specific Novelty

Authors: Youngjin Kim, Wontae Nam, Hyunwoo Kim, Ji-Hoon Kim, Gunhee Kim

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental With extensive experiments on static image classification, grid-world and three hard-exploration Atari games, we show that Curiosity-Bottleneck learns an effective exploration strategy by robustly measuring the state novelty in distractive environments where stateof-the-art exploration methods often degenerate.
Researcher Affiliation Collaboration Youngjin Kim 1 2 Wontae Nam 3 Hyunwoo Kim 2 Ji-Hoon Kim 4 Gunhee Kim 2 1NALBI Inc. 2Seoul National University, South Korea 3Machine Learning Lab, KC Co. Ltd., South Korea 4Clova AI Research, NAVER Corp., South Korea.
Pseudocode Yes Algorithm 1 Curiosity-Bottleneck with PPO
Open Source Code Yes More details can be found in the supplementary file and the code which is available at http://vision.snu.ac.kr/projects/cb.
Open Datasets Yes static image classification tasks on MNIST (Le Cun & Cortes, 2010) and Fashion MNIST (Xiao et al., 2017)
Dataset Splits No The paper does not provide specific percentages or sample counts for training, validation, and test dataset splits, nor does it refer to predefined standard splits with citations that include this information.
Hardware Specification No The paper mentions using the NAVER Smart Machine Learning (NSML) platform but does not specify the underlying hardware details such as specific GPU or CPU models.
Software Dependencies No The paper mentions using Proximal Policy Optimization (PPO) and refers to code from other authors for baselines, but it does not specify software versions for libraries or frameworks like Python, PyTorch, or TensorFlow.
Experiment Setup Yes We assume a Gaussian distribution for both compressor output distribution pθ(z|x) = N(µθ(x), σθ(x)) and variational prior q(z) = N(0, I). The compressor network consists of a standard three-layer convolutional neural network followed by an MLP that outputs both the mean µθ(x) RK of z and the diagonal elements of covariance matrix σθ(x) RK. [...] Experiments run for up to 327M rollouts (40K updates of parameters with 64 parallel environments).