Neural Episodic Control with State Abstraction
Authors: Zhuo Li, Derui Zhu, Yujing Hu, Xiaofei Xie, Lei Ma, YAN ZHENG, Yan Song, Yingfeng Chen, Jianjun Zhao
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach to the Mu Jo Co and Atari tasks in Open AI gym domains. The experimental results indicate that NECSA achieves higher sample efficiency than the state-of-the-art episodic control-based approaches. |
| Researcher Affiliation | Collaboration | Zhuo Li1 Derui Zhu2 Yujing Hu3 Xiaofei Xie4 Lei Ma5,6, Yan Zheng7 Yan Song3 Yingfeng Chen3 Jianjun Zhao1 1Kyushu University 2Technical University of Munich 3Net Ease Fuxi AI Lab 4Singapore Management University 5University of Alberta 6The University of Tokyo 7Tianjin University |
| Pseudocode | Yes | Algorithm 1 NECSA. |
| Open Source Code | Yes | Our data and code are available at the project website1. 1https://sites.google.com/view/drl-necsa |
| Open Datasets | Yes | We conduct the experiments on nine Mu Jo Co tasks and six Atari games in Open AI gym (Brockman et al., 2016) domains. |
| Dataset Splits | No | The paper mentions training steps and evaluation results but does not explicitly describe training/validation/test dataset splits with specific percentages or sample counts for reproduction. |
| Hardware Specification | Yes | All the experiments were run on powerful servers with CPU (Intel(R) Core(TM) i9-10940X CPU @ 3.30GHz), and GPU(NVIDIA Corporation GA102GL [RTX A6000]) with 128GB RAM. |
| Software Dependencies | No | The paper states: 'We also adopt the implementation of DDPG, TD3, DQN, and Rainbow in (Weng et al., 2021).' and 'The implementation of NECSA is based on TD3.'. While Tianshou (Weng et al., 2021) is mentioned, specific version numbers for the Tianshou library itself or other software components like PyTorch or CUDA are not provided. |
| Experiment Setup | Yes | Our hyperparameter settings are listed in Table 1 and Table 2. ... For Mu Jo Co tasks, the neural network consists of two hidden layers. The size of each layer is 256. The activation unit is Re LU. Both the Actor and Critic networks share the same structure. For Atari tasks, we use a three-layer convolution neural network head and a fully connected layer to output the Q-values of each action. |