reproducibilityindex.ai

On Value Function Representation of Long Horizon Problems

Authors: Lucas Lehnert, Romain Laroche, Harm van Seijen

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theoretical results are veriﬁed empirically on randomly generated MDPs and on a gridworld fruit collection task using deep value function approximation. Our theoretical results highlight a connection between value function approximation and the Options framework and suggest that value functions should be decomposed along bottlenecks of the MDP s transition dynamics. We conduct two sets of experiments: The ﬁrst experiment veriﬁes the dependence of the discount factor γ on the maximal action-gap on randomly sampled MDPs. The second experiment approximates the ground truth value function of a grid-world fruit collection task with a deep neural network.
Researcher Affiliation	Collaboration	Lucas Lehnert,1,2 Romain Laroche,1 Harm van Seijen1 lucas.lehnert@brown.edu, {romain.laroche, harm.vanseijen}@microsoft.com 1Microsoft Maluuba, Montreal, QC, Canada 2Brown University, Providence, RI, United States
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access information (e.g., a link or explicit statement of code release) for its methodology.
Open Datasets	No	The paper mentions conducting experiments on "randomly generated MDPs" and a "grid-world fruit collection task" but does not provide a specific link, DOI, repository name, or formal citation for a publicly available dataset used for training.
Dataset Splits	No	The paper does not provide specific details about training/validation/test dataset splits. It states that DNNs were trained on "all the 1,386,375 possible states with their ground truth values", implying no explicit splitting strategy for reproduction.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions using the "Adam optimizer (Kingma and Ba 2014) with default parameters" but does not provide specific version numbers for any software dependencies (e.g., deep learning frameworks, libraries).
Experiment Setup	Yes	Each DNN is trained over 500 epochs using the Adam optimizer (Kingma and Ba 2014) with default parameters.