Learning Markov State Abstractions for Deep Reinforcement Learning
Authors: Cameron Allen, Neev Parikh, Omer Gottesman, George Konidaris
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate our approach on a visual gridworld domain and a set of continuous control benchmarks. Our approach learns representations that capture the underlying structure of the domain and lead to improved sample efficiency over state-of-the-art deep reinforcement learning with visual features often matching or exceeding the performance achieved with hand-designed compact state information. |
| Researcher Affiliation | Academia | Cameron Allen Brown University Neev Parikh Brown University Omer Gottesman Brown University George Konidaris Brown University |
| Pseudocode | No | The paper describes its methods in prose and with architectural diagrams (e.g., Figure 1) but does not include formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code repository available at https://github.com/camall3n/markov-state-abstractions. |
| Open Datasets | Yes | a collection of image-based, continuous control tasks from the Deep Mind Control Suite (Tassa et al., 2020). |
| Dataset Splits | No | The paper mentions training and evaluating models but does not provide specific details on validation dataset splits or percentages. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU specifications, or memory used for the experiments. |
| Software Dependencies | No | The paper mentions the use of the Adam optimizer but does not provide specific version numbers for software libraries, frameworks, or programming languages (e.g., PyTorch, Python, CUDA versions). |
| Experiment Setup | Yes | We use a total training batch size of 256 for all DeepMind Control tasks. We use the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 1e-4 for all trainable parameters. ... by minimizing LMarkov (with α = β = 1, η = 0). |