End-to-End Learning for Stochastic Optimization: A Bayesian Perspective
Authors: Yves Rychener, Daniel Kuhn, Tobias Sutter
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical results for a synthetic newsvendor problem illustrate the key differences between alternative training schemes. We also investigate an economic dispatch problem based on real data to showcase the impact of the neural network architecture of the decision maps on their test performance. |
| Researcher Affiliation | Academia | 1Risk Analytics and Optimization Chair, Ecole Polytechnique F ed erale de Lausanne, Switzerland 2Department of Computer and Information Science, University of Konstanz, Germany. |
| Pseudocode | Yes | Algorithm 1 End-to-End Learning for k 1, . . . , K do gk wℓ(Yk, mw(Xk))|w=wk 1 wk wk 1 ηkgk end for |
| Open Source Code | Yes | Implementation details are given in Appendix C, and the code underlying all experiments is provided on Git Hub.1 |
| Open Datasets | Yes | We use historical wind power production and weather records2 as samples from P(X,Y )....2https://www.kaggle.com/datasets/theforcecoder/wind-power-forecasting |
| Dataset Splits | No | The dataset covers the period from 1 January 2018 to 30 March 2020. After removing corrupted samples, the period from 1 January 2018 to 31 December 2019 comprises 59,532 records, which we use as the training set. The remaining records are used for testing. No explicit mention of a validation set is found. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions using the Adam optimizer (Kingma & Ba, 2015) and neural network architectures, but it does not specify version numbers for any software dependencies like Python, specific machine learning frameworks (e.g., PyTorch, TensorFlow), or other libraries. |
| Experiment Setup | Yes | We solve the resulting instance of (5) using Algorithm 1 with K = 5 * 10^6 training samples. The Batch-SGD algorithm runs over 50,000 iterations with 100 samples per batch to reduce the variance of the gradient updates. ... The neural network-based predictions bµNN are compared against the sample mean bµERM and posterior mean bµMMSE. ... (CAL): The CAL architecture consists of a feature extractor that maps the observation X to a 6-dimensional feature R and a prescriptor that maps R into the feasible set A. The feature extractor involves one hidden layer with 64 neurons and Re LU activation functions and an output layer with 6 neurons and Sigmoid activation functions, which determine the output of each generator as a percentage of its capacity. |