End-to-End Learning for Stochastic Optimization: A Bayesian Perspective

Authors: Yves Rychener, Daniel Kuhn, Tobias Sutter

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical results for a synthetic newsvendor problem illustrate the key differences between alternative training schemes. We also investigate an economic dispatch problem based on real data to showcase the impact of the neural network architecture of the decision maps on their test performance.
Researcher Affiliation Academia 1Risk Analytics and Optimization Chair, Ecole Polytechnique F ed erale de Lausanne, Switzerland 2Department of Computer and Information Science, University of Konstanz, Germany.
Pseudocode Yes Algorithm 1 End-to-End Learning for k 1, . . . , K do gk wℓ(Yk, mw(Xk))|w=wk 1 wk wk 1 ηkgk end for
Open Source Code Yes Implementation details are given in Appendix C, and the code underlying all experiments is provided on Git Hub.1
Open Datasets Yes We use historical wind power production and weather records2 as samples from P(X,Y )....2https://www.kaggle.com/datasets/theforcecoder/wind-power-forecasting
Dataset Splits No The dataset covers the period from 1 January 2018 to 30 March 2020. After removing corrupted samples, the period from 1 January 2018 to 31 December 2019 comprises 59,532 records, which we use as the training set. The remaining records are used for testing. No explicit mention of a validation set is found.
Hardware Specification No The paper does not provide specific details on the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions using the Adam optimizer (Kingma & Ba, 2015) and neural network architectures, but it does not specify version numbers for any software dependencies like Python, specific machine learning frameworks (e.g., PyTorch, TensorFlow), or other libraries.
Experiment Setup Yes We solve the resulting instance of (5) using Algorithm 1 with K = 5 * 10^6 training samples. The Batch-SGD algorithm runs over 50,000 iterations with 100 samples per batch to reduce the variance of the gradient updates. ... The neural network-based predictions bµNN are compared against the sample mean bµERM and posterior mean bµMMSE. ... (CAL): The CAL architecture consists of a feature extractor that maps the observation X to a 6-dimensional feature R and a prescriptor that maps R into the feasible set A. The feature extractor involves one hidden layer with 64 neurons and Re LU activation functions and an output layer with 6 neurons and Sigmoid activation functions, which determine the output of each generator as a percentage of its capacity.