Hindsight Optimization for Hybrid State and Action MDPs
Authors: Aswin Raghavan, Scott Sanner, Roni Khardon, Prasad Tadepalli, Alan Fern
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results show that the HSA-HOP approach effectively scales to high-dimensional problems and outperforms baselines that are capable of scaling to such large hybrid MDPs. |
| Researcher Affiliation | Academia | Aswin Raghavan,1 Scott Sanner,2 Roni Khardon,3 Prasad Tadepalli,1 Alan Fern1 1School of EECS, Oregon State University, Corvallis, OR, USA. {nadamuna,tadepall,afern}@eecs.orst.edu 2Industrial Engineering, University of Toronto, Toronto, ON, Canada. ssanner@mie.utoronto.ca 3Department of Computer Science, Tufts University, Medford, MA, USA. roni@cs.tufts.edu |
| Pseudocode | No | The paper provides detailed descriptions and mathematical formulations of its algorithms and a syntax table (Table 1), but it does not include a formal 'Pseudocode' block or 'Algorithm' section. |
| Open Source Code | No | The paper does not provide any explicit statement about making its source code available or include a link to a code repository. |
| Open Datasets | No | The paper describes problem domains (Power Generation, Reservoirs, Icetrack) and how instances were generated or configured (e.g., 'varied the number of plants between 10 and 50', 'All reservoirs are empty in the initial world state'). These are problem specifications for generating instances, not references to pre-existing, publicly available datasets with concrete access information (e.g., links, DOIs, or formal citations to public datasets). |
| Dataset Splits | No | The paper does not explicitly mention training, validation, or test dataset splits in the conventional sense for supervised learning. It describes evaluation in an 'online replanning mode' with 'average accumulated reward over a horizon of 20 steps... averaged over 30 trials', which is a simulation-based evaluation setup rather than data partitioning. |
| Hardware Specification | No | The paper states 'In all experiments we use the Gurobi optimizer (Gurobi Optimization 2015) for optimizing the MILPs,' but it does not specify any hardware details such as CPU, GPU models, or memory used for running these experiments. |
| Software Dependencies | No | The paper mentions using 'the Gurobi optimizer (Gurobi Optimization 2015) for optimizing the MILPs.' While Gurobi is named, 'Gurobi Optimization 2015' is likely a citation to their reference manual or company, not a specific version number of the software (e.g., Gurobi 7.0). |
| Experiment Setup | Yes | Each evaluation has three experimental parameters : (1) Time limit t per decision in minutes, (2) Lookahead h, the length of sampled futures and, (3) Number of sampled futures F per decision. ... In the Power Generation problem with lookahead h = 4, t = 0.5 (mins) and F = 5 per decision. ... setting h = 4, t = 2 (mins) and F = 5 per decision. ... settings h = 20, t = 1(mins) and F = 5. |