Accelerating Robotic Reinforcement Learning via Parameterized Action Primitives
Authors: Murtaza Dalal, Deepak Pathak, Russ R. Salakhutdinov
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform a thorough empirical study across challenging tasks in three distinct domains with image input and a sparse terminal reward. We find that our simple change to the action interface substantially improves both the learning efficiency and task performance irrespective of the underlying RL algorithm, significantly outperforming prior methods which learn skills from offline expert data. |
| Researcher Affiliation | Academia | Murtaza Dalal Deepak Pathak Ruslan Salakhutdinov Carnegie Mellon University {mdalal,dpathak,rsalakhu} @ cs.cmu.edu |
| Pseudocode | Yes | Procedure 1 Parameterized Action Primitive Input: primitive dependent argument vector args, primitive index k, robot state s |
| Open Source Code | Yes | Code and videos at https://mihdalal.github.io/raps/ |
| Open Datasets | Yes | We evaluate RAPS on three simulated domains: Metaworld [17], Kitchen [52] and Robosuite [54], containing 16 tasks with varying levels of difficulty, realism and task diversity (see the bottom half of Fig. 1). We train SPIRL and PARROT from images using the kitchen demonstration datasets in D4RL [16]... |
| Dataset Splits | No | The paper does not explicitly mention training/validation/test dataset splits, specific percentages, or provide details for a dedicated validation set. |
| Hardware Specification | Yes | To ensure consistency, we evaluate all methods on a single RTX 2080 GPU with 10 CPUs and 50GB of memory. |
| Software Dependencies | No | The paper mentions using 'Dreamer' as the underlying RL algorithm but does not specify version numbers for any software components or libraries required for reproducibility. |
| Experiment Setup | No | The paper describes general experimental conditions like 'sparse reward' and 'image observations' but does not provide specific hyperparameters such as learning rates, batch sizes, optimizer settings, or other detailed training configurations necessary for reproducibility. |