Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting

Authors: Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins, Chicheng Zhang

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study contextual bandit learning with an abstract policy class and continuous action space. We obtain two qualitatively different regret bounds: one competes with a smoothed version of the policy class under no continuity assumptions, while the other requires standard Lipschitz assumptions. Both bounds exhibit data-dependent zooming behavior and and, with no tuning, yield improved guarantees for benign problems. We also study adapting to unknown smoothness parameters, establishing a price-of-adaptivity and deriving optimal adaptive algorithms that require no additional information. Keywords: Contextual bandits, nonparametric learning. Our algorithms are not computationally efficient, with running times that scale polynomially in |Π|. The significance lies is in the new conceptual approach and the regret bounds.
Researcher Affiliation Collaboration Akshay Krishnamurthy EMAIL Microsoft Research New York, NY 10011; John Langford EMAIL Microsoft Research New York, NY 10011; Aleksandrs Slivkins EMAIL Microsoft Research New York, NY 10011; Chicheng Zhang EMAIL University of Arizona Tucson, AZ 85721
Pseudocode Yes The pseudocode for this algorithm is displayed in Algorithm 2. Algorithm 1 Smooth Policy Elimination; Algorithm 2 Continuous EXP4: EXP4 with continuous sampling
Open Source Code No The paper does not provide concrete access to source code. It states that "Our algorithms are not computationally efficient, with running times that scale polynomially in |Π|. The significance lies is in the new conceptual approach and the regret bounds." suggesting the focus is theoretical rather than on providing implemented code.
Open Datasets No The paper focuses on theoretical frameworks, algorithms, and regret bounds for contextual bandits with continuous actions. It does not conduct experiments using specific datasets and therefore does not provide concrete access information for any publicly available or open datasets.
Dataset Splits No The paper is theoretical in nature, presenting algorithms and regret bounds rather than empirical evaluations. As no datasets are used for experiments, there is no information provided about dataset splits for training, testing, or validation.
Hardware Specification No The paper is a theoretical work on contextual bandits with continuous actions, focusing on algorithm design and regret bounds. It does not describe any experiments that were run on specific hardware, therefore no hardware specifications are provided.
Software Dependencies No The paper focuses on theoretical aspects of contextual bandit algorithms and regret bounds. It does not describe any software implementations or provide specific version numbers for any ancillary software or libraries.
Experiment Setup No The paper is theoretical, presenting new algorithms and regret bounds for contextual bandits. It does not describe any experimental setups, hyperparameters, or training configurations, as it does not conduct empirical evaluations.