Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Constrained Linear Thompson Sampling
Authors: Aditya Gangrade, Venkatesh Saligrama
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study safe linear bandits (SLBs), where an agent selects actions from a convex set to maximize an unknown linear objective subject to unknown linear constraints in each round. Existing methods for SLBs provide strong regret guarantees, but require solving expensive optimization problems. To address this, we propose Constrained Linear Thompson Sampling (COLTS), a sampling-based framework that selects actions by solving perturbed linear programs, which significantly reduces computational costs while matching the regret and risk of prior methods. We develop two main variants: S-COLTS, which ensures zero risk and e O(d3T) regret given a safe action, and R-COLTS, which achieves e O(d3T) regret and risk with no instance information. In simulations, these methods match or outperform state of the art SLB approaches while substantially improving scalability. On the technical front, we introduce a novel coupled noise design that ensures frequent local optimism about the true optimum, and a scaling-based analysis to handle the per-round variability of constraints. A simulation study ( 6, J) further validates these claims. |
| Researcher Affiliation | Academia | Aditya Gangrade Boston University EMAIL Venkatesh Saligrama Boston University EMAIL |
| Pseudocode | Yes | Algorithm 1 Scaling-COLTS (S-COLTS(µ, δ)) Algorithm 2 Resampling-COLTS (R-COLTS(µ, r, δ)) Algorithm 3 Exploratory-COLTS (E-COLTS(µ, δ)) |
| Open Source Code | No | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: the methods implemented use standard estimates in bandits, along with library linear programming routines. We believe that this is easy to implement. |
| Open Datasets | Yes | We set Φ to be a certain 9 9 directed adjacency matrix, A, obtained from https: //sparse.tamu.edu/van Heukelum/cage4, which is a 60% populated matrix with d = m = 9. |
| Dataset Splits | No | The paper describes a bandit problem, which is an online learning setting where data is acquired sequentially, not partitioned into predefined training, validation, and test sets. The '100 trials' or '100 runs' mentioned refer to repetitions of the simulation experiment for statistical analysis, not dataset splits for model training. |
| Hardware Specification | Yes | All experiments were executed on a consumer-grade laptop computer running a Ryzen-5 chip, in the MATLAB environment, and the total time of all experiments ran to about 8 hours. |
| Software Dependencies | No | All experiments were executed on a consumer-grade laptop computer running a Ryzen-5 chip, in the MATLAB environment, and the total time of all experiments ran to about 8 hours. In both cases, we used the library methods linprog and coneprog provided by MATLAB to implement these methods. The paper mentions 'MATLAB' and specific library methods 'linprog' and 'coneprog', but it does not specify version numbers for MATLAB or these libraries, which is required for a reproducible description of ancillary software. |
| Experiment Setup | Yes | Setting. We set Φ to be a certain 9 9 directed adjacency matrix... We study the problem of optimising θ = 1d/√d over A = [0, 1/√d]d, and enforce the unkonwn constraints Φ a 0.8 1/√d. We note that the action 0 is always safe... Throughout, we set δ = 0.1... In our subsequent experiments, we work with γ = 0.5 instead of √3d... We now study R-COLTS and E-COLTS over the long horizon T = 5 * 10^4. We execute R-COLTS with zero resamplings (i.e., E-COLTS with no exploration), and then one and finally two resamplings in each round, all driven by the coupled perturbation noise with ν0.5. |