On Value Function Representation of Long Horizon Problems
Authors: Lucas Lehnert, Romain Laroche, Harm van Seijen
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theoretical results are verified empirically on randomly generated MDPs and on a gridworld fruit collection task using deep value function approximation. Our theoretical results highlight a connection between value function approximation and the Options framework and suggest that value functions should be decomposed along bottlenecks of the MDP s transition dynamics. We conduct two sets of experiments: The first experiment verifies the dependence of the discount factor γ on the maximal action-gap on randomly sampled MDPs. The second experiment approximates the ground truth value function of a grid-world fruit collection task with a deep neural network. |
| Researcher Affiliation | Collaboration | Lucas Lehnert,1,2 Romain Laroche,1 Harm van Seijen1 lucas.lehnert@brown.edu, {romain.laroche, harm.vanseijen}@microsoft.com 1Microsoft Maluuba, Montreal, QC, Canada 2Brown University, Providence, RI, United States |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., a link or explicit statement of code release) for its methodology. |
| Open Datasets | No | The paper mentions conducting experiments on "randomly generated MDPs" and a "grid-world fruit collection task" but does not provide a specific link, DOI, repository name, or formal citation for a publicly available dataset used for training. |
| Dataset Splits | No | The paper does not provide specific details about training/validation/test dataset splits. It states that DNNs were trained on "all the 1,386,375 possible states with their ground truth values", implying no explicit splitting strategy for reproduction. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using the "Adam optimizer (Kingma and Ba 2014) with default parameters" but does not provide specific version numbers for any software dependencies (e.g., deep learning frameworks, libraries). |
| Experiment Setup | Yes | Each DNN is trained over 500 epochs using the Adam optimizer (Kingma and Ba 2014) with default parameters. |