Leaving the Nest: Going beyond Local Loss Functions for Predict-Then-Optimize
Authors: Sanket Shah, Bryan Wilder, Andrew Perrault, Milind Tambe
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show that our method achieves state-of-the-art results in four domains from the literature, often requiring an order of magnitude fewer samples than comparable methods from past work. In this section, we validate EGLs empirically on four domains from the literature. For each set of experiments, we run 10 experiments with different train-test splits, and randomized initializations of the predictive model and loss function parameters. |
| Researcher Affiliation | Academia | Sanket Shah1, Bryan Wilder2, Andrew Perrault3, Milind Tambe1 1Harvard University 2Carnegie Mellon University 3Ohio State University |
| Pseudocode | No | No explicitly labeled pseudocode or algorithm blocks are present. Section 2.2 lists steps but not in a pseudocode format. |
| Open Source Code | No | The paper does not contain any explicit statements about open-source code availability or links to code repositories for the described methodology. |
| Open Datasets | Yes | The features for each website are obtained by multiplying the true CTRs ym from the Yahoo! Webscope Dataset (Yahoo! 2007) with a random N N matrix A, resulting in xm = Aym. Predict the future stock price yn for each stock n using its historical data xn. The historical data includes information on 50 stocks obtained from the Quandl WIKI dataset (Quandl 2022). |
| Dataset Splits | Yes | For each set of experiments, we run 10 experiments with different train-test splits, and randomized initializations of the predictive model and loss function parameters. ...for each instance y in the training and validation set. In Table 5 (Appendix D.2), we see that EGLs outperform LODLs and follow the trends noted above if we measure their performance on the validation set, which is closer in time to training (and hence has less distribution shift). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models or memory specifications used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers for its dependencies. While it mentions neural networks and gradient descent, it doesn't list framework versions (e.g., PyTorch 1.x, TensorFlow 2.x). |
| Experiment Setup | Yes | The hyperparameters associated with this approach are: Number of Models: Instead of sampling predictions from just one model, we can train multiple models to increase the diversity of the generated predictions. In our experiments, we choose from {1, 5, 10} predictive models. LR and Number of Training Steps: The learning rates are chosen from {10 6, 10 5, . . . , 1} with a possible cyclic schedule (Smith 2017). We use a maximum of 50000 updates across all the models. For our experiments, the model P is a 4-layer feedforward neural network with a hidden dimension of 500 trained using gradient descent. Details of the computational resources and hyperparameter optimization used are given in Appendix D. |