Low-Rank Modular Reinforcement Learning via Muscle Synergy
Authors: Heng Dong, Tonghan Wang, Jiayuan Liu, Chongjie Zhang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively evaluate our method on a variety of robot morphologies, and the results show its superior efficiency and generalizability, especially on robots with a large Do F like Humanoid++ and UNIMALs. |
| Researcher Affiliation | Academia | Heng Dong IIIS, Tsinghua University drdhxi@gmail.com Tonghan Wang Harvard University twang1@g.harvard.edu Jiayuan Liu IIIS, Tsinghua University georgejiayuan@gmail.com Chongjie Zhang IIIS, Tsinghua University chongjie@tsinghua.edu.cn |
| Pseudocode | No | The paper describes algorithms such as Affinity Propagation but does not present them in a structured 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | And our code is available at Git Hub*. https://github.com/drdh/Synergy-RL |
| Open Datasets | Yes | For multi-task and zero-shot evaluation, we adopt the widely-used modular MTRL benchmarks [Huang et al., 2020, Kurin et al., 2020, Hong et al., 2021], which are created based on Gym Mu Jo Co locomotion tasks by Huang et al. [2020]. |
| Dataset Splits | Yes | Humanoid++ has eight variants of humanoids, where six of them are used as training tasks (see Figure 3) and the other two are used as testing tasks (Figure 4). |
| Hardware Specification | No | The paper states in the ethics checklist that hardware specifications are included in the appendices, but these details are not present in the provided main text. |
| Software Dependencies | No | The paper mentions using TD3 [Fujimoto et al., 2018] and implementation in the AMORPHEUS codebase, but it does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | We use TD3 [Fujimoto et al., 2018] as the underlying reinforcement learning algorithm for training the policy over all baselines, ablations and our method for fairness. We test all methods with 4 random seeds and show the mean performance as well as 95% confidence intervals. |