Attentive Experience Replay
Authors: Peiquan Sun, Wengang Zhou, Houqiang Li5900-5907
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We couple AER with different off-policy algorithms and demonstrate that AER makes consistent improvements on the suite of Open AI gym tasks. |
| Researcher Affiliation | Academia | Peiquan Sun, Wengang Zhou, Houqiang Li CAS Key Laboratory of Technology in GIPAS, EEIS Department, University of Science and Technology of China spq@mail.ustc.edu.cn, {zhwg, lihq}@ustc.edu.cn |
| Pseudocode | Yes | Algorithm 1 Attentive experience replay |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described, nor does it mention a repository link or state that code is available in supplementary materials. |
| Open Datasets | Yes | We compare AER with two ER methods, the uniform sampling and the prioritized experience replay (PER) (Schaul et al. 2016), on the suite of Open AI gym tasks (Figure 1) (Brockman et al. 2016). |
| Dataset Splits | No | The paper describes evaluation procedures ('evaluated every 5000 steps, by running the policy deterministically and cumulative rewards are averaged over 50 evaluation episodes'), but does not specify static training/validation/test dataset splits in the conventional supervised learning sense, as is common in reinforcement learning environments. |
| Hardware Specification | Yes | All experiments are performed on a server with 40 Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.4GHz processors and 8 Ge Force GTX-1080 Ti 12 GB GPU. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer' but does not provide specific version numbers for software dependencies such as libraries or frameworks. |
| Experiment Setup | Yes | For deep actor-critic algorithms (SAC, TD3 and DDPG), both policy-network and valuenetwork are represented using MLP with two hidden layers (256, 256) and optimized using Adam (Kingma and Ba 2014) with learning rare of 3 10 4. The replay buffer size is 10^6 and the sampling mini-batch size is 256. |