Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Bi-linear Value Networks for Multi-goal Reinforcement Learning
Authors: Zhang-Wei Hong, Ge Yang, Pulkit Agrawal
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evidence is provided on the simulated Fetch robot task-suite, and dexterous manipulation with a Shadow hand. |
| Researcher Affiliation | Collaboration | Zhang-Wei Hong , Ge Yang* & Pulkit Agrawal Improbable AI Lab NSF AI Institute for AI and Fundamental Interactions (IAIFI) , MIT-IBM Watson AI Lab Massachusetts Institute Technology |
| Pseudocode | No | No pseudocode or algorithm blocks are explicitly presented or labeled in the paper. |
| Open Source Code | No | We provide detailed instructions for reproducing the results in this paper in the Appendix. Please refer to Section Section A.4. |
| Open Datasets | Yes | All experiments in Section 5.1 happen on the standard object and dexterous manipulation tasks from the gym robotics suite (Plappert et al., 2018). |
| Dataset Splits | No | The paper describes training and testing splits, but does not explicitly mention or detail a validation set or its proportions. |
| Hardware Specification | No | We thank members of Improbable AI Lab for the helpful discussion and feedback. We are grateful to MIT Supercloud and the Lincoln Laboratory Supercomputing Center for providing HPC resources. |
| Software Dependencies | No | The paper mentions various algorithms and networks (e.g., DDPG, HER, SAC, TD3, MLP) but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | Fetch We adapted the hyperparameters in (Plappert et al., 2018) for training in a single desktop. (...) num workers: 2 for Reach, 8 for Push, 16 for Pick & Place, and 20 for Slide. Batch size: 1024 Warm up rollouts: We collected 100 initial rollouts for pre๏ฌlling the replay buffer. Training frequency: We train the agent per 2 environment steps. |