Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
SAPG: Split and Aggregate Policy Gradients
Authors: Jayesh Singla, Ananye Agarwal, Deepak Pathak
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experimental Setup; 6. Results and Analysis; Figure 5. Performance curves of SAPG with respect to PPO, PBT and PQL baselines. |
| Researcher Affiliation | Academia | 1Carnegie Mellon University. |
| Pseudocode | Yes | Algorithm 1 SAPG |
| Open Source Code | Yes | Webpage at https://sapg-rl.github.io. |
| Open Datasets | Yes | We conduct experiments on 5 manipulation tasks (3 hard and 2 easy) and compare them against SOTA methods for the large-scale parallelized setting. We use a GPU-accelerated simulator, Isaac Gym (Makoviychuk et al., 2021); For testing, we choose a suite of manipulation environments that are challenging and require large-scale data to learn effective policies (Petrenko et al., 2023). |
| Dataset Splits | No | The paper does not explicitly provide details about training, validation, and test dataset splits with percentages or sample counts. In reinforcement learning, data is typically generated through interaction, and while they specify test environments, they don’t mention a validation split for the collected experience. |
| Hardware Specification | No | The paper mentions running experiments on a “single GPU” and using “GPU-accelerated simulators” but does not provide specific hardware models (e.g., specific GPU model, CPU model, or memory size). |
| Software Dependencies | No | The paper mentions various software components and frameworks like PPO, Isaac Gym, Phys X, Mujoco-3.0, and ELU activation, but it does not provide specific version numbers for these or other ancillary software dependencies used in their implementation. |
| Experiment Setup | Yes | 5. Experimental Setup; B. Training hyperparameters; Table 2. Training hyperparameters for Allegro Kuka tasks; Table 3. Training hyperparameters for Shadow Hand; Table 4. Training hyperparameters for Shadow Hand |