Generalized Hidden Parameter MDPs:Transferable Model-Based RL in a Handful of Trials
Authors: Christian Perez, Felipe Petroski Such, Theofanis Karaletsos5403-5411
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally demonstrate state-of-the-art performance and sample-efficiency on a new challenging Mu Jo Co task using reward and dynamics latent spaces, while beating a previous state-of-the-art baseline with > 10 less data. |
| Researcher Affiliation | Industry | Christian F. Perez, Felipe Petroski Such, Theofanis Karaletsos Uber AI Labs San Francisco, CA 94105 {cfp, felipe.such, theofanis}@uber.com |
| Pseudocode | Yes | Algorithm 1 Learning and control with MPC and Latent Variable Models |
| Open Source Code | No | The paper does not provide an explicit statement about the release of source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We evaluate both the joint and structured LV model with a total of 8 latent dimensions using experiments in the Mu Jo Co Ant environment, a challenging benchmark for model-based RL (Todorov, Erez, and Tassa 2012). |
| Dataset Splits | No | The paper describes training and test sets but does not explicitly provide details on a validation set or specific split percentages for training, validation, and test data. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like Mu Jo Co and neural networks but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | The architecture for all experiments is an ensemble of 5 neural networks with 3 hidden layers of 256 units for the dynamics model, and 1 hidden layer of 32 units for the reward model. |