Contextual Linear Optimization with Bandit Feedback
Authors: Yichun Hu, Nathan Kallus, Xiaojie Mao, Yanchen Wu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare the performance of different modeling choices numerically using a stochastic shortest path example and provide practical insights from the empirical results. |
| Researcher Affiliation | Academia | 1 Cornell University 2 Tsinghua University |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code scripts for all experiments can be found at https://github.com/Causal ML/CLOBandit. |
| Open Datasets | No | We first generate i.i.d draws of the covariates X = (X1, X2, X3) R3 from independent standard normal distribution. |
| Dataset Splits | Yes | The penalty coefficient is finally set as the half of the value chosen by this validation procedure. |
| Hardware Specification | Yes | All experiments in the paper are implemented on a cloud computing platform with 128 CPUs of model Intel(R) Xeon(R) Platinum 8369B CPU @ 2.70GHz, 250GB RAM and 500GB storage. |
| Software Dependencies | No | The paper does not specify version numbers for any software components or libraries used in the experiments. |
| Experiment Setup | Yes | We incorporate an additional ridge penalty with a coefficient 1. We select the penalty coefficient from a grid of 0, 0.001, 0.01 and 10 points distributed uniformly on the logarithmic scale over 0.1 to 100. This is done by minimizing the out-of-sample error on an independent validation dataset with size equal to the corresponding training data. The penalty coefficient is finally set as the half of the value chosen by this validation procedure. |