Low-Rank Modular Reinforcement Learning via Muscle Synergy

Authors: Heng Dong, Tonghan Wang, Jiayuan Liu, Chongjie Zhang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively evaluate our method on a variety of robot morphologies, and the results show its superior efficiency and generalizability, especially on robots with a large Do F like Humanoid++ and UNIMALs.
Researcher Affiliation Academia Heng Dong IIIS, Tsinghua University drdhxi@gmail.com Tonghan Wang Harvard University twang1@g.harvard.edu Jiayuan Liu IIIS, Tsinghua University georgejiayuan@gmail.com Chongjie Zhang IIIS, Tsinghua University chongjie@tsinghua.edu.cn
Pseudocode No The paper describes algorithms such as Affinity Propagation but does not present them in a structured 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes And our code is available at Git Hub*. https://github.com/drdh/Synergy-RL
Open Datasets Yes For multi-task and zero-shot evaluation, we adopt the widely-used modular MTRL benchmarks [Huang et al., 2020, Kurin et al., 2020, Hong et al., 2021], which are created based on Gym Mu Jo Co locomotion tasks by Huang et al. [2020].
Dataset Splits Yes Humanoid++ has eight variants of humanoids, where six of them are used as training tasks (see Figure 3) and the other two are used as testing tasks (Figure 4).
Hardware Specification No The paper states in the ethics checklist that hardware specifications are included in the appendices, but these details are not present in the provided main text.
Software Dependencies No The paper mentions using TD3 [Fujimoto et al., 2018] and implementation in the AMORPHEUS codebase, but it does not specify version numbers for any software dependencies.
Experiment Setup Yes We use TD3 [Fujimoto et al., 2018] as the underlying reinforcement learning algorithm for training the policy over all baselines, ablations and our method for fairness. We test all methods with 4 random seeds and show the mean performance as well as 95% confidence intervals.