Contextual Linear Optimization with Bandit Feedback

Authors: Yichun Hu, Nathan Kallus, Xiaojie Mao, Yanchen Wu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare the performance of different modeling choices numerically using a stochastic shortest path example and provide practical insights from the empirical results.
Researcher Affiliation Academia 1 Cornell University 2 Tsinghua University
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The code scripts for all experiments can be found at https://github.com/Causal ML/CLOBandit.
Open Datasets No We first generate i.i.d draws of the covariates X = (X1, X2, X3) R3 from independent standard normal distribution.
Dataset Splits Yes The penalty coefficient is finally set as the half of the value chosen by this validation procedure.
Hardware Specification Yes All experiments in the paper are implemented on a cloud computing platform with 128 CPUs of model Intel(R) Xeon(R) Platinum 8369B CPU @ 2.70GHz, 250GB RAM and 500GB storage.
Software Dependencies No The paper does not specify version numbers for any software components or libraries used in the experiments.
Experiment Setup Yes We incorporate an additional ridge penalty with a coefficient 1. We select the penalty coefficient from a grid of 0, 0.001, 0.01 and 10 points distributed uniformly on the logarithmic scale over 0.1 to 100. This is done by minimizing the out-of-sample error on an independent validation dataset with size equal to the corresponding training data. The penalty coefficient is finally set as the half of the value chosen by this validation procedure.