Curiosity-Bottleneck: Exploration By Distilling Task-Specific Novelty
Authors: Youngjin Kim, Wontae Nam, Hyunwoo Kim, Ji-Hoon Kim, Gunhee Kim
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | With extensive experiments on static image classification, grid-world and three hard-exploration Atari games, we show that Curiosity-Bottleneck learns an effective exploration strategy by robustly measuring the state novelty in distractive environments where stateof-the-art exploration methods often degenerate. |
| Researcher Affiliation | Collaboration | Youngjin Kim 1 2 Wontae Nam 3 Hyunwoo Kim 2 Ji-Hoon Kim 4 Gunhee Kim 2 1NALBI Inc. 2Seoul National University, South Korea 3Machine Learning Lab, KC Co. Ltd., South Korea 4Clova AI Research, NAVER Corp., South Korea. |
| Pseudocode | Yes | Algorithm 1 Curiosity-Bottleneck with PPO |
| Open Source Code | Yes | More details can be found in the supplementary file and the code which is available at http://vision.snu.ac.kr/projects/cb. |
| Open Datasets | Yes | static image classification tasks on MNIST (Le Cun & Cortes, 2010) and Fashion MNIST (Xiao et al., 2017) |
| Dataset Splits | No | The paper does not provide specific percentages or sample counts for training, validation, and test dataset splits, nor does it refer to predefined standard splits with citations that include this information. |
| Hardware Specification | No | The paper mentions using the NAVER Smart Machine Learning (NSML) platform but does not specify the underlying hardware details such as specific GPU or CPU models. |
| Software Dependencies | No | The paper mentions using Proximal Policy Optimization (PPO) and refers to code from other authors for baselines, but it does not specify software versions for libraries or frameworks like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | We assume a Gaussian distribution for both compressor output distribution pθ(z|x) = N(µθ(x), σθ(x)) and variational prior q(z) = N(0, I). The compressor network consists of a standard three-layer convolutional neural network followed by an MLP that outputs both the mean µθ(x) RK of z and the diagonal elements of covariance matrix σθ(x) RK. [...] Experiments run for up to 327M rollouts (40K updates of parameters with 64 parallel environments). |