Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting

Authors: Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins, Chicheng Zhang

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study contextual bandit learning with an abstract policy class and continuous action space. We obtain two qualitatively diﬀerent regret bounds: one competes with a smoothed version of the policy class under no continuity assumptions, while the other requires standard Lipschitz assumptions. Both bounds exhibit data-dependent zooming behavior and and, with no tuning, yield improved guarantees for benign problems. We also study adapting to unknown smoothness parameters, establishing a price-of-adaptivity and deriving optimal adaptive algorithms that require no additional information. Keywords: Contextual bandits, nonparametric learning. Our algorithms are not computationally eﬃcient, with running times that scale polynomially in \|Π\|. The signiﬁcance lies is in the new conceptual approach and the regret bounds.
Researcher Affiliation	Collaboration	Akshay Krishnamurthy EMAIL Microsoft Research New York, NY 10011; John Langford EMAIL Microsoft Research New York, NY 10011; Aleksandrs Slivkins EMAIL Microsoft Research New York, NY 10011; Chicheng Zhang EMAIL University of Arizona Tucson, AZ 85721
Pseudocode	Yes	The pseudocode for this algorithm is displayed in Algorithm 2. Algorithm 1 Smooth Policy Elimination; Algorithm 2 Continuous EXP4: EXP4 with continuous sampling
Open Source Code	No	The paper does not provide concrete access to source code. It states that "Our algorithms are not computationally eﬃcient, with running times that scale polynomially in \|Π\|. The signiﬁcance lies is in the new conceptual approach and the regret bounds." suggesting the focus is theoretical rather than on providing implemented code.
Open Datasets	No	The paper focuses on theoretical frameworks, algorithms, and regret bounds for contextual bandits with continuous actions. It does not conduct experiments using specific datasets and therefore does not provide concrete access information for any publicly available or open datasets.
Dataset Splits	No	The paper is theoretical in nature, presenting algorithms and regret bounds rather than empirical evaluations. As no datasets are used for experiments, there is no information provided about dataset splits for training, testing, or validation.
Hardware Specification	No	The paper is a theoretical work on contextual bandits with continuous actions, focusing on algorithm design and regret bounds. It does not describe any experiments that were run on specific hardware, therefore no hardware specifications are provided.
Software Dependencies	No	The paper focuses on theoretical aspects of contextual bandit algorithms and regret bounds. It does not describe any software implementations or provide specific version numbers for any ancillary software or libraries.
Experiment Setup	No	The paper is theoretical, presenting new algorithms and regret bounds for contextual bandits. It does not describe any experimental setups, hyperparameters, or training configurations, as it does not conduct empirical evaluations.