Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization

Authors: Sanath Kumar Krishnamurthy, Ruohan Zhan, Susan Athey, Emma Brunskill

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To demonstrate the computational tractability of our approach, we ran a simulation on setting within a R2context space, eight arms, linear models, and an exploration horizon of 5000. Our algorithms ran in less than 9 seconds on a Macbook M1 Pro. We also compare with other baselines on simple/cumulative regret. See Appendix E.4 for details.
Researcher Affiliation Academia Sanath Kumar Krishnamurthy Management Science and Engineering Stanford University sanathsk@stanford.edu; Ruohan Zhan Industrial Engineering and Decision Analytics Hong Kong University of Science and Technology rhzhan@ust.hk; Susan Athey Graduate School of Business Stanford University athey@stanford.edu; Emma Brunskill Computer Science Department Stanford University ebrun@cs.stanford.edu
Pseudocode Yes Algorithm 1 ω Risk Adjusted Proportional Response (ω-RAPR)
Open Source Code No The paper does not include any explicit statements or links indicating that source code for the described methodology is publicly available.
Open Datasets No The paper mentions running a simulation ('We ran a simulation on setting within a R2context space, eight arms, linear models, and an exploration horizon of 5000.') but does not specify a publicly available dataset, nor does it provide any link, DOI, or citation for data access.
Dataset Splits No The paper states, 'Our algorithm splits Sm into three equally-sized subsets: Sm,1, Sm,2 and Sm,3.' This refers to internal data splitting within the algorithm's operation, not to explicit train/validation/test dataset splits for experimental reproduction. The 'Simulations' section does not provide any specific dataset split information.
Hardware Specification Yes Our algorithms ran in less than 9 seconds on a Macbook M1 Pro.
Software Dependencies No The paper describes conceptual components of its algorithm and refers to types of solvers (e.g., 'cost-sensitive classification (CSC) solver'), but it does not specify any particular software libraries, packages, or their version numbers that would be required for replication.
Experiment Setup No The paper states, 'We ran a simulation on setting within a R2context space, eight arms, linear models, and an exploration horizon of 5000.' While this gives some context, it does not provide specific hyperparameter values (e.g., learning rate, batch size, optimizer settings) or detailed training configurations needed for reproducible experimental setup.