Learning to Reweight Imaginary Transitions for Model-Based Reinforcement Learning
Authors: Wenzhen Huang, Qiyue Yin, Junge Zhang, Kaiqi Huang7848-7856
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results demonstrate that our method outperforms state-of-the-art model-based and model-free RL algorithms on multiple tasks. |
| Researcher Affiliation | Academia | 1 School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China 2 CRISE, Institute of Automation, Chinese Academy of Sciences, Beijing, China 3 CAS Center for Excellence in Brain Science and Intelligence Technology, Beijing, China |
| Pseudocode | Yes | Algorithm 1 Reweighted Probabilistic-Ensemble Soft Actor-Critic (Re W-PE-SAC) |
| Open Source Code | No | The paper mentions a PyTorch implementation for SAC baseline, but there is no explicit statement or link indicating that the authors' own code for the proposed method is open-source or available. |
| Open Datasets | Yes | We evaluate our algorithm on six complex continuous control tasks from the model-based RL benchmark (Wang et al. 2019), which is modified from the Open AI gym benchmark suite (Brockman et al. 2016). |
| Dataset Splits | No | The paper describes using a replay buffer for training but does not specify explicit training, validation, or test dataset splits with percentages or sample counts for reproduction. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions a PyTorch implementation for a baseline SAC, but it does not provide specific version numbers for general software dependencies (e.g., Python, PyTorch) required to replicate the experiment. |
| Experiment Setup | Yes | The network architecture and training hyperparameters are given in the appendix. |