reproducibilityindex.ai

Curiosity-Bottleneck: Exploration By Distilling Task-Specific Novelty

Authors: Youngjin Kim, Wontae Nam, Hyunwoo Kim, Ji-Hoon Kim, Gunhee Kim

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	With extensive experiments on static image classiﬁcation, grid-world and three hard-exploration Atari games, we show that Curiosity-Bottleneck learns an effective exploration strategy by robustly measuring the state novelty in distractive environments where stateof-the-art exploration methods often degenerate.
Researcher Affiliation	Collaboration	Youngjin Kim 1 2 Wontae Nam 3 Hyunwoo Kim 2 Ji-Hoon Kim 4 Gunhee Kim 2 1NALBI Inc. 2Seoul National University, South Korea 3Machine Learning Lab, KC Co. Ltd., South Korea 4Clova AI Research, NAVER Corp., South Korea.
Pseudocode	Yes	Algorithm 1 Curiosity-Bottleneck with PPO
Open Source Code	Yes	More details can be found in the supplementary ﬁle and the code which is available at http://vision.snu.ac.kr/projects/cb.
Open Datasets	Yes	static image classiﬁcation tasks on MNIST (Le Cun & Cortes, 2010) and Fashion MNIST (Xiao et al., 2017)
Dataset Splits	No	The paper does not provide specific percentages or sample counts for training, validation, and test dataset splits, nor does it refer to predefined standard splits with citations that include this information.
Hardware Specification	No	The paper mentions using the NAVER Smart Machine Learning (NSML) platform but does not specify the underlying hardware details such as specific GPU or CPU models.
Software Dependencies	No	The paper mentions using Proximal Policy Optimization (PPO) and refers to code from other authors for baselines, but it does not specify software versions for libraries or frameworks like Python, PyTorch, or TensorFlow.
Experiment Setup	Yes	We assume a Gaussian distribution for both compressor output distribution pθ(z\|x) = N(µθ(x), σθ(x)) and variational prior q(z) = N(0, I). The compressor network consists of a standard three-layer convolutional neural network followed by an MLP that outputs both the mean µθ(x) RK of z and the diagonal elements of covariance matrix σθ(x) RK. [...] Experiments run for up to 327M rollouts (40K updates of parameters with 64 parallel environments).