Anti-Exploration by Random Network Distillation

Authors: Alexander Nikulin, Vladislav Kurenkov, Denis Tarasov, Sergey Kolesnikov

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on the D4RL (Fu et al., 2020) benchmark (Section 6), and show that SAC-RND achieves performance comparable to ensemble-based methods while outperforming ensemble-free approaches.
Researcher Affiliation Industry 1Tinkoff, Moscow, Russia. Correspondence to: Alexander Nikulin <a.p.nikulin@tinkoff.ai>.
Pseudocode Yes Algorithm 1 Soft Actor-Critic with Random Network Distillation (SAC-RND)
Open Source Code Yes For the exact implementation of conditioning variants for predictor and prior networks, refer to our code at https://github.com/tinkoff-ai/sac-rnd.
Open Datasets Yes We evaluate our method on all available datasets for the Half Cheetah, Walker2d and Hopper tasks in the Gym domain of the D4RL benchmark.
Dataset Splits No The paper mentions training for a certain number of steps and evaluating on a number of episodes (e.g., 'train for 3M gradient steps, evaluating on 10 episodes'). However, it does not explicitly provide details about specific training, validation, or test dataset splits (e.g., percentages or counts) for the D4RL benchmark datasets.
Hardware Specification Yes All experiments were performed on V100 and A100 GPUs.
Software Dependencies No The paper mentions using 'Jax (Bradbury et al., 2018) framework' and refers to 'Py Torch (Paszke et al., 2019)' and 'Adam (Kingma & Ba, 2014)'. However, it does not specify exact version numbers for these software components or other ancillary software dependencies.
Experiment Setup Yes Table 4. SAC-RND general hyperparameters. Parameter Value optimizer Adam (Kingma & Ba, 2014) batch size 1024 (256 on antmaze-*) learning rate (all networks) 1e-3 (3e-4 on antmaze-*) tau (τ) 5e-3 hidden dim (all networks) 256 num layers (all networks) 4 RND embedding dim (all tasks) 32 target entropy -action_dim gamma (γ) 0.99 (0.999 on antmaze-*) nonlinearity ReLU