Optimizing Energy Production Using Policy Search and Predictive State Representations
Authors: Yuri Grinberg, Doina Precup, Michel Gendreau
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare our solution to a DP-based solution developed by Hydro Qu ebec based on historical inflow data, and show both quantitative and qualitative improvement.Sec. 5 presents a quantitative and qualitative analysis of the results |
| Researcher Affiliation | Collaboration | Yuri Grinberg Doina Precup School of Computer Science, Mc Gill University Montreal, QC, Canada {ygrinb,dprecup}@cs.mcgill.ca Michel Gendreau Ecole Polytechnique de Montr eal Montreal, QC, Canada michel.gendreau@cirrelt.ca NSERC/Hydro-Qu ebec Industrial Research Chair on the Stochastic Optimization of Electricity Generation, CIRRELT and D epartement de Math ematiques et de G enie Industriel, Ecole Polytechnique de Montr eal. |
| Pseudocode | Yes | Algorithm 1 Policy search algorithm |
| Open Source Code | No | The paper does not provide any statement or link regarding the public availability of its source code. |
| Open Datasets | No | Historical data suggests that it is safe to assume that the inflows at different sites in the same period t are just scaled values of each other. However, there is relatively little data available to optimize the problem through simulation: there are only 54 years of inflow data, which translates into 2808 values (one value per week see Fig. 1). Hydro-Quebec use this data to learn a generative model for inflows. |
| Dataset Splits | No | The estimate of the expected reward of a policy is calculated by running the simulator on a single 2000-year-long trajectory obtained from the generative model described in Sec. 2. All solutions are evaluated on the original historical data. No specific train/validation/test splits are mentioned for a dataset. |
| Hardware Specification | No | The paper does not provide any specific hardware details for the experiments. |
| Software Dependencies | No | The paper mentions 'the SAMS software [11]' and that an initial version of the simulator was ported to Java, but no specific version numbers are provided for these or any other software dependencies. |
| Experiment Setup | Yes | Parameters: N maximum number of interations θ = {θR2, θR3, θR4} = {θ1, ..., θm} Rm initial parameter vector n number of parallel policy evaluations Threshold significance threshold γ sampling variance. The estimate of the expected reward of a policy is calculated by running the simulator on a single 2000-year-long trajectory obtained from the generative model described in Sec. 2. Since the algorithm depends on the initialization of the parameter vector, we sample the initial parameter vector uniformly at random and repeat the search 50 times. |