Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Thompson Sampling with Information Relaxation Penalties
Authors: Seungki Min, Costis Maglaras, Ciamac C. Moallemi
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments suggest that this new class of policies perform well, in particular in settings where the finite time horizon introduces significant exploration-exploitation tension into the problem. |
| Researcher Affiliation | Academia | Seungki Min Columbia Business School Costis Maglaras Columbia Business School Ciamac C. Moallemi Columbia Business School |
| Pseudocode | Yes | Algorithm 1: Information relaxation sampling (IRS) policy |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The paper describes generating data for simulation rather than using a publicly available dataset with concrete access information: 'In the simulation, we randomly generate a set of outcomes ω(1), . . . , ω(S) and measure the performance of each policy π, V (π, T, y), and the performance bounds, W z(T, y), via sample average approximation across these sampled outcomes (S = 20, 000).' |
| Dataset Splits | No | The paper describes generating S=20,000 outcomes for simulation and evaluation, but does not specify dataset splits (e.g., train/validation/test percentages or counts) as would typically be done for model training and evaluation on pre-existing datasets. The policies themselves are not 'trained' on data splits in the traditional machine learning sense. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | Yes | We visualize the effectiveness of IRS policies and performance bounds in case of Gaussian MAB with five arms (K = 5) with different noise variances. More specifically, each arm a A has the unknown mean reward θa N(0, 12) and yields the stochastic rewards Ra,n N(θa, σ2 a) where σ1 = 0.1, σ2 = 0.4, σ3 = 1.0, σ4 = 4.0 and σ5 = 10.0. Our experiment includes the state-of-the-art algorithms that are particularly suitable in a Bayesian framework: Bayesian upper confidence bound [14] (BAYES-UCB, with a quantile of 1 1 t ), information directed sampling [20] (IDS), and optimistic Gittins index [9] (OGI, one-step look ahead approximation with a discount factor γt = 1 1 t ). In the simulation, we randomly generate a set of outcomes ω(1), . . . , ω(S) and measure the performance of each policy π, V (π, T, y), and the performance bounds, W z(T, y), via sample average approximation across these sampled outcomes (S = 20, 000). |