Solving Continuous Control with Episodic Memory
Authors: Igor Kuznetsov, Andrey Filchenkov
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our algorithm on Open AI gym domains and show greater sample-efficiency compared with the state-of-the-art model-free off-policy algorithms. We evaluate our model on a set of Open AI gym environments [Dhariwal et al., 2017] (Figure 1) and show that it achieves greater sample efficiency compared with the state-of-the-art off-policy algorithms (TD3, SAC). Figure 4: Evaluation results on Open AI Gym Benchmark. Table 1: Average return over 10 trials of 200000 time steps. corresponds to a standard deviation over 10 trials. |
| Researcher Affiliation | Academia | Igor Kuznetsov , Andrey Filchenkov ITMO University igorkuznetsov14@gmail.com, afilchenkov@itmo.ru, |
| Pseudocode | Yes | Algorithm 1 EMAC |
| Open Source Code | Yes | We open sourced our algorithm to achieve reproducibility. All the codes and learning curves can be accessed at: http://github. com/schatty/EMAC. |
| Open Datasets | Yes | We evaluate our algorithm on a set of Open AI gym domains [Dhariwal et al., 2017] |
| Dataset Splits | No | Evaluation is performed every 1000 steps with the reported value as an average from 10 evaluation episodes from different seeds without any exploration. |
| Hardware Specification | Yes | All our experiments are performed on single 1080ti NVIDIA card. |
| Software Dependencies | No | Networks parameters are updated with Adam optimizer [Kingma and Ba, 2014] with a learning rate of 0.001. |
| Experiment Setup | Yes | Networks parameters are updated with Adam optimizer [Kingma and Ba, 2014] with a learning rate of 0.001. All models consists of two hidden layers, size 256, for an actor and a critic and a rectified linear unit (Re LU) as a nonlinearity. For the first 1000 time steps we do not exploit an actor for action selection and choose the actions randomly for the exploration purpose. |