Anti-Exploration by Random Network Distillation
Authors: Alexander Nikulin, Vladislav Kurenkov, Denis Tarasov, Sergey Kolesnikov
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on the D4RL (Fu et al., 2020) benchmark (Section 6), and show that SAC-RND achieves performance comparable to ensemble-based methods while outperforming ensemble-free approaches. |
| Researcher Affiliation | Industry | 1Tinkoff, Moscow, Russia. Correspondence to: Alexander Nikulin <a.p.nikulin@tinkoff.ai>. |
| Pseudocode | Yes | Algorithm 1 Soft Actor-Critic with Random Network Distillation (SAC-RND) |
| Open Source Code | Yes | For the exact implementation of conditioning variants for predictor and prior networks, refer to our code at https://github.com/tinkoff-ai/sac-rnd. |
| Open Datasets | Yes | We evaluate our method on all available datasets for the Half Cheetah, Walker2d and Hopper tasks in the Gym domain of the D4RL benchmark. |
| Dataset Splits | No | The paper mentions training for a certain number of steps and evaluating on a number of episodes (e.g., 'train for 3M gradient steps, evaluating on 10 episodes'). However, it does not explicitly provide details about specific training, validation, or test dataset splits (e.g., percentages or counts) for the D4RL benchmark datasets. |
| Hardware Specification | Yes | All experiments were performed on V100 and A100 GPUs. |
| Software Dependencies | No | The paper mentions using 'Jax (Bradbury et al., 2018) framework' and refers to 'Py Torch (Paszke et al., 2019)' and 'Adam (Kingma & Ba, 2014)'. However, it does not specify exact version numbers for these software components or other ancillary software dependencies. |
| Experiment Setup | Yes | Table 4. SAC-RND general hyperparameters. Parameter Value optimizer Adam (Kingma & Ba, 2014) batch size 1024 (256 on antmaze-*) learning rate (all networks) 1e-3 (3e-4 on antmaze-*) tau (τ) 5e-3 hidden dim (all networks) 256 num layers (all networks) 4 RND embedding dim (all tasks) 32 target entropy -action_dim gamma (γ) 0.99 (0.999 on antmaze-*) nonlinearity ReLU |