State Abstractions for Lifelong Reinforcement Learning
Authors: David Abel, Dilip Arumugam, Lucas Lehnert, Michael Littman
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4. Experiments We conduct two sets of simple experiments with the goal of illuminating how state abstractions of various forms impact learning and decision making. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Brown University. Correspondence to: David Abel <david_abel@brown.edu>. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We make all our code publicly available for reproduction of results and extension.1 1https://github.com/david-abel/rl_abstraction |
| Open Datasets | No | The paper utilizes custom-built grid-world environments ("Color Room," "Four Room") for experiments and does not provide concrete access information (specific link, DOI, repository, or formal citation with authors/year) for a publicly available or open dataset. |
| Dataset Splits | No | The paper describes episodic learning over 100 episodes or 250/500 steps, and mentions averaging results and 95% confidence intervals, but it does not specify explicit train/validation/test dataset splits with percentages, sample counts, or citations to predefined splits. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper mentions standard algorithms like Q-Learning and Delayed Q-Learning, but it does not list specific software dependencies or library versions (e.g., Python, PyTorch, TensorFlow versions) that would be needed for replication. |
| Experiment Setup | Yes | Each agent is given 250 steps to interact with the MDP. The abstraction parameter is set to ε = 0.01. For PAC abstractions, δ = 0.1 and ε = 0.1. One experiment involves 100 episodes with 500 steps per episode. |