Bi-linear Value Networks for Multi-goal Reinforcement Learning

Authors: Zhang-Wei Hong, Ge Yang, Pulkit Agrawal

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evidence is provided on the simulated Fetch robot task-suite, and dexterous manipulation with a Shadow hand.
Researcher Affiliation Collaboration Zhang-Wei Hong , Ge Yang* & Pulkit Agrawal Improbable AI Lab NSF AI Institute for AI and Fundamental Interactions (IAIFI) , MIT-IBM Watson AI Lab Massachusetts Institute Technology
Pseudocode No No pseudocode or algorithm blocks are explicitly presented or labeled in the paper.
Open Source Code No We provide detailed instructions for reproducing the results in this paper in the Appendix. Please refer to Section Section A.4.
Open Datasets Yes All experiments in Section 5.1 happen on the standard object and dexterous manipulation tasks from the gym robotics suite (Plappert et al., 2018).
Dataset Splits No The paper describes training and testing splits, but does not explicitly mention or detail a validation set or its proportions.
Hardware Specification No We thank members of Improbable AI Lab for the helpful discussion and feedback. We are grateful to MIT Supercloud and the Lincoln Laboratory Supercomputing Center for providing HPC resources.
Software Dependencies No The paper mentions various algorithms and networks (e.g., DDPG, HER, SAC, TD3, MLP) but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes Fetch We adapted the hyperparameters in (Plappert et al., 2018) for training in a single desktop. (...) num workers: 2 for Reach, 8 for Push, 16 for Pick & Place, and 20 for Slide. Batch size: 1024 Warm up rollouts: We collected 100 initial rollouts for prefilling the replay buffer. Training frequency: We train the agent per 2 environment steps.