SAPG: Split and Aggregate Policy Gradients

Authors: Jayesh Singla, Ananye Agarwal, Deepak Pathak

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experimental Setup; 6. Results and Analysis; Figure 5. Performance curves of SAPG with respect to PPO, PBT and PQL baselines.
Researcher Affiliation Academia 1Carnegie Mellon University.
Pseudocode Yes Algorithm 1 SAPG
Open Source Code Yes Webpage at https://sapg-rl.github.io.
Open Datasets Yes We conduct experiments on 5 manipulation tasks (3 hard and 2 easy) and compare them against SOTA methods for the large-scale parallelized setting. We use a GPU-accelerated simulator, Isaac Gym (Makoviychuk et al., 2021); For testing, we choose a suite of manipulation environments that are challenging and require large-scale data to learn effective policies (Petrenko et al., 2023).
Dataset Splits No The paper does not explicitly provide details about training, validation, and test dataset splits with percentages or sample counts. In reinforcement learning, data is typically generated through interaction, and while they specify test environments, they don’t mention a validation split for the collected experience.
Hardware Specification No The paper mentions running experiments on a “single GPU” and using “GPU-accelerated simulators” but does not provide specific hardware models (e.g., specific GPU model, CPU model, or memory size).
Software Dependencies No The paper mentions various software components and frameworks like PPO, Isaac Gym, Phys X, Mujoco-3.0, and ELU activation, but it does not provide specific version numbers for these or other ancillary software dependencies used in their implementation.
Experiment Setup Yes 5. Experimental Setup; B. Training hyperparameters; Table 2. Training hyperparameters for Allegro Kuka tasks; Table 3. Training hyperparameters for Shadow Hand; Table 4. Training hyperparameters for Shadow Hand