reproducibilityindex.ai

Hindsight Optimization for Hybrid State and Action MDPs

Authors: Aswin Raghavan, Scott Sanner, Roni Khardon, Prasad Tadepalli, Alan Fern

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results show that the HSA-HOP approach effectively scales to high-dimensional problems and outperforms baselines that are capable of scaling to such large hybrid MDPs.
Researcher Affiliation	Academia	Aswin Raghavan,1 Scott Sanner,2 Roni Khardon,3 Prasad Tadepalli,1 Alan Fern1 1School of EECS, Oregon State University, Corvallis, OR, USA. {nadamuna,tadepall,afern}@eecs.orst.edu 2Industrial Engineering, University of Toronto, Toronto, ON, Canada. ssanner@mie.utoronto.ca 3Department of Computer Science, Tufts University, Medford, MA, USA. roni@cs.tufts.edu
Pseudocode	No	The paper provides detailed descriptions and mathematical formulations of its algorithms and a syntax table (Table 1), but it does not include a formal 'Pseudocode' block or 'Algorithm' section.
Open Source Code	No	The paper does not provide any explicit statement about making its source code available or include a link to a code repository.
Open Datasets	No	The paper describes problem domains (Power Generation, Reservoirs, Icetrack) and how instances were generated or configured (e.g., 'varied the number of plants between 10 and 50', 'All reservoirs are empty in the initial world state'). These are problem specifications for generating instances, not references to pre-existing, publicly available datasets with concrete access information (e.g., links, DOIs, or formal citations to public datasets).
Dataset Splits	No	The paper does not explicitly mention training, validation, or test dataset splits in the conventional sense for supervised learning. It describes evaluation in an 'online replanning mode' with 'average accumulated reward over a horizon of 20 steps... averaged over 30 trials', which is a simulation-based evaluation setup rather than data partitioning.
Hardware Specification	No	The paper states 'In all experiments we use the Gurobi optimizer (Gurobi Optimization 2015) for optimizing the MILPs,' but it does not specify any hardware details such as CPU, GPU models, or memory used for running these experiments.
Software Dependencies	No	The paper mentions using 'the Gurobi optimizer (Gurobi Optimization 2015) for optimizing the MILPs.' While Gurobi is named, 'Gurobi Optimization 2015' is likely a citation to their reference manual or company, not a specific version number of the software (e.g., Gurobi 7.0).
Experiment Setup	Yes	Each evaluation has three experimental parameters : (1) Time limit t per decision in minutes, (2) Lookahead h, the length of sampled futures and, (3) Number of sampled futures F per decision. ... In the Power Generation problem with lookahead h = 4, t = 0.5 (mins) and F = 5 per decision. ... setting h = 4, t = 2 (mins) and F = 5 per decision. ... settings h = 20, t = 1(mins) and F = 5.