Bi-linear Value Networks for Multi-goal Reinforcement Learning
Authors: Zhang-Wei Hong, Ge Yang, Pulkit Agrawal
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evidence is provided on the simulated Fetch robot task-suite, and dexterous manipulation with a Shadow hand. |
| Researcher Affiliation | Collaboration | Zhang-Wei Hong , Ge Yang* & Pulkit Agrawal Improbable AI Lab NSF AI Institute for AI and Fundamental Interactions (IAIFI) , MIT-IBM Watson AI Lab Massachusetts Institute Technology |
| Pseudocode | No | No pseudocode or algorithm blocks are explicitly presented or labeled in the paper. |
| Open Source Code | No | We provide detailed instructions for reproducing the results in this paper in the Appendix. Please refer to Section Section A.4. |
| Open Datasets | Yes | All experiments in Section 5.1 happen on the standard object and dexterous manipulation tasks from the gym robotics suite (Plappert et al., 2018). |
| Dataset Splits | No | The paper describes training and testing splits, but does not explicitly mention or detail a validation set or its proportions. |
| Hardware Specification | No | We thank members of Improbable AI Lab for the helpful discussion and feedback. We are grateful to MIT Supercloud and the Lincoln Laboratory Supercomputing Center for providing HPC resources. |
| Software Dependencies | No | The paper mentions various algorithms and networks (e.g., DDPG, HER, SAC, TD3, MLP) but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | Fetch We adapted the hyperparameters in (Plappert et al., 2018) for training in a single desktop. (...) num workers: 2 for Reach, 8 for Push, 16 for Pick & Place, and 20 for Slide. Batch size: 1024 Warm up rollouts: We collected 100 initial rollouts for preļ¬lling the replay buffer. Training frequency: We train the agent per 2 environment steps. |