Episodic Curiosity through Reachability

Authors: Nikolay Savinov, Anton Raichuk, Damien Vincent, Raphael Marinier, Marc Pollefeys, Timothy Lillicrap, Sylvain Gelly

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test our approach in visually rich 3D environments in Viz Doom, DMLab and Mu Jo Co. In navigational tasks from Viz Doom and DMLab, our agent outperforms the state-of-the-art curiosity method ICM. In Mu Jo Co, an ant equipped with our curiosity module learns locomotion out of the first-person-view curiosity only. The code is available at https://github.com/google-research/episodic-curiosity.
Researcher Affiliation Collaboration 1Google Brain, 2Deep Mind, 3ETH Z urich
Pseudocode No The paper describes algorithmic steps in prose and uses diagrams but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/google-research/episodic-curiosity.
Open Datasets Yes We test our method in multiple environments from Viz Doom (Kempka et al., 2016), DMLab (Beattie et al., 2016) and Mu Jo Co (Todorov et al., 2012; Schulman et al., 2015).
Dataset Splits Yes As DMLab environments are procedurally generated, we perform tuning on the validation set, disjoint with the training and test sets.
Hardware Specification No The paper does not specify any particular CPU, GPU, or TPU models used for running the experiments. It only mentions general computing environments like 'PPO (same as in the main text of the paper)' for Mu Jo Co.
Software Dependencies No The paper mentions using the 'PPO algorithm from the open-source implementation2' (footnote 2: 'https://github.com/openai/baselines') and 'gym-mujoco6' (footnote 6: 'https://gym.openai.com/envs/Ant-v2/') but does not provide specific version numbers for software dependencies like Python, TensorFlow/PyTorch, or other libraries.
Experiment Setup Yes The hyperparameters of the PPO algorithm are given in the supplementary material. We use only two sets of hyperparameters: one for all Viz Doom environments and the other one for all DMLab environments.