reproducibilityindex.ai

Anti-Exploration by Random Network Distillation

Authors: Alexander Nikulin, Vladislav Kurenkov, Denis Tarasov, Sergey Kolesnikov

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on the D4RL (Fu et al., 2020) benchmark (Section 6), and show that SAC-RND achieves performance comparable to ensemble-based methods while outperforming ensemble-free approaches.
Researcher Affiliation	Industry	1Tinkoff, Moscow, Russia. Correspondence to: Alexander Nikulin <a.p.nikulin@tinkoff.ai>.
Pseudocode	Yes	Algorithm 1 Soft Actor-Critic with Random Network Distillation (SAC-RND)
Open Source Code	Yes	For the exact implementation of conditioning variants for predictor and prior networks, refer to our code at https://github.com/tinkoff-ai/sac-rnd.
Open Datasets	Yes	We evaluate our method on all available datasets for the Half Cheetah, Walker2d and Hopper tasks in the Gym domain of the D4RL benchmark.
Dataset Splits	No	The paper mentions training for a certain number of steps and evaluating on a number of episodes (e.g., 'train for 3M gradient steps, evaluating on 10 episodes'). However, it does not explicitly provide details about specific training, validation, or test dataset splits (e.g., percentages or counts) for the D4RL benchmark datasets.
Hardware Specification	Yes	All experiments were performed on V100 and A100 GPUs.
Software Dependencies	No	The paper mentions using 'Jax (Bradbury et al., 2018) framework' and refers to 'Py Torch (Paszke et al., 2019)' and 'Adam (Kingma & Ba, 2014)'. However, it does not specify exact version numbers for these software components or other ancillary software dependencies.
Experiment Setup	Yes	Table 4. SAC-RND general hyperparameters. Parameter Value optimizer Adam (Kingma & Ba, 2014) batch size 1024 (256 on antmaze-) learning rate (all networks) 1e-3 (3e-4 on antmaze-) tau (τ) 5e-3 hidden dim (all networks) 256 num layers (all networks) 4 RND embedding dim (all tasks) 32 target entropy -action_dim gamma (γ) 0.99 (0.999 on antmaze-*) nonlinearity ReLU