Optimizing Energy Production Using Policy Search and Predictive State Representations

Authors: Yuri Grinberg, Doina Precup, Michel Gendreau

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare our solution to a DP-based solution developed by Hydro Qu ebec based on historical inflow data, and show both quantitative and qualitative improvement.Sec. 5 presents a quantitative and qualitative analysis of the results
Researcher Affiliation Collaboration Yuri Grinberg Doina Precup School of Computer Science, Mc Gill University Montreal, QC, Canada {ygrinb,dprecup}@cs.mcgill.ca Michel Gendreau Ecole Polytechnique de Montr eal Montreal, QC, Canada michel.gendreau@cirrelt.ca NSERC/Hydro-Qu ebec Industrial Research Chair on the Stochastic Optimization of Electricity Generation, CIRRELT and D epartement de Math ematiques et de G enie Industriel, Ecole Polytechnique de Montr eal.
Pseudocode Yes Algorithm 1 Policy search algorithm
Open Source Code No The paper does not provide any statement or link regarding the public availability of its source code.
Open Datasets No Historical data suggests that it is safe to assume that the inflows at different sites in the same period t are just scaled values of each other. However, there is relatively little data available to optimize the problem through simulation: there are only 54 years of inflow data, which translates into 2808 values (one value per week see Fig. 1). Hydro-Quebec use this data to learn a generative model for inflows.
Dataset Splits No The estimate of the expected reward of a policy is calculated by running the simulator on a single 2000-year-long trajectory obtained from the generative model described in Sec. 2. All solutions are evaluated on the original historical data. No specific train/validation/test splits are mentioned for a dataset.
Hardware Specification No The paper does not provide any specific hardware details for the experiments.
Software Dependencies No The paper mentions 'the SAMS software [11]' and that an initial version of the simulator was ported to Java, but no specific version numbers are provided for these or any other software dependencies.
Experiment Setup Yes Parameters: N maximum number of interations θ = {θR2, θR3, θR4} = {θ1, ..., θm} Rm initial parameter vector n number of parallel policy evaluations Threshold significance threshold γ sampling variance. The estimate of the expected reward of a policy is calculated by running the simulator on a single 2000-year-long trajectory obtained from the generative model described in Sec. 2. Since the algorithm depends on the initialization of the parameter vector, we sample the initial parameter vector uniformly at random and repeat the search 50 times.