Hindsight Optimization for Hybrid State and Action MDPs

Authors: Aswin Raghavan, Scott Sanner, Roni Khardon, Prasad Tadepalli, Alan Fern

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results show that the HSA-HOP approach effectively scales to high-dimensional problems and outperforms baselines that are capable of scaling to such large hybrid MDPs.
Researcher Affiliation Academia Aswin Raghavan,1 Scott Sanner,2 Roni Khardon,3 Prasad Tadepalli,1 Alan Fern1 1School of EECS, Oregon State University, Corvallis, OR, USA. {nadamuna,tadepall,afern}@eecs.orst.edu 2Industrial Engineering, University of Toronto, Toronto, ON, Canada. ssanner@mie.utoronto.ca 3Department of Computer Science, Tufts University, Medford, MA, USA. roni@cs.tufts.edu
Pseudocode No The paper provides detailed descriptions and mathematical formulations of its algorithms and a syntax table (Table 1), but it does not include a formal 'Pseudocode' block or 'Algorithm' section.
Open Source Code No The paper does not provide any explicit statement about making its source code available or include a link to a code repository.
Open Datasets No The paper describes problem domains (Power Generation, Reservoirs, Icetrack) and how instances were generated or configured (e.g., 'varied the number of plants between 10 and 50', 'All reservoirs are empty in the initial world state'). These are problem specifications for generating instances, not references to pre-existing, publicly available datasets with concrete access information (e.g., links, DOIs, or formal citations to public datasets).
Dataset Splits No The paper does not explicitly mention training, validation, or test dataset splits in the conventional sense for supervised learning. It describes evaluation in an 'online replanning mode' with 'average accumulated reward over a horizon of 20 steps... averaged over 30 trials', which is a simulation-based evaluation setup rather than data partitioning.
Hardware Specification No The paper states 'In all experiments we use the Gurobi optimizer (Gurobi Optimization 2015) for optimizing the MILPs,' but it does not specify any hardware details such as CPU, GPU models, or memory used for running these experiments.
Software Dependencies No The paper mentions using 'the Gurobi optimizer (Gurobi Optimization 2015) for optimizing the MILPs.' While Gurobi is named, 'Gurobi Optimization 2015' is likely a citation to their reference manual or company, not a specific version number of the software (e.g., Gurobi 7.0).
Experiment Setup Yes Each evaluation has three experimental parameters : (1) Time limit t per decision in minutes, (2) Lookahead h, the length of sampled futures and, (3) Number of sampled futures F per decision. ... In the Power Generation problem with lookahead h = 4, t = 0.5 (mins) and F = 5 per decision. ... setting h = 4, t = 2 (mins) and F = 5 per decision. ... settings h = 20, t = 1(mins) and F = 5.