Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Generator-Mediated Bandits: Thompson Sampling for GenAI-Powered Adaptive Interventions
Authors: Marc Brooks, Gabriel Durham, Kihyuk Hong, Ambuj Tewari
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In simulation studies, GAMBITTS consistently outperforms conventional algorithms by leveraging observed treatments to more accurately estimate expected rewards. |
| Researcher Affiliation | Academia | 1Department of Statistics, University of Michigan, Ann Arbor, MI (USA) EMAIL |
| Pseudocode | Yes | Algorithm 1 Fully Online GAMBITTS (fo GAMBITTS) Inputs: Data Dt, priors π1, π2, models f1, m2, current context xt. |
| Open Source Code | Yes | We provide code in the supplementary materials with instructions for reproducing the simulation results reported in Section 6 and Appendix F. This code will be made publicly available on Git Hub upon publication; the link is omitted here to preserve anonymity. |
| Open Datasets | No | While no deployed JITAIs currently integrate LLMs for real-time message generation (and thus no real-world dataset exists for this setting), we calibrate our simulation using empirical distributions from the 2023 IHS to model realistic conditional reward structures. |
| Dataset Splits | No | All figures are based on 250 Monte Carlo runs per agent, with 95% confidence intervals shown. |
| Hardware Specification | Yes | The simulation of text-based treatments generated by Llama 3.1 was implemented on a high-performance compute cluster. Each node in the cluster included two 2.9 GHz Intel Xeon Gold 6226R processors, 8 GB of allocated RAM, and a single NVIDIA A40 GPU with 48 GB of memory. VAE training was performed on a similar setup, except each node was allocated 16 GB of RAM. The simulation studies were conducted on nodes equipped with two 3.0 GHz Intel Xeon Gold 6154 processors, 16 CPU cores, and between 16 32 GB of allocated RAM. |
| Software Dependencies | Yes | Our implementation of ens-po GAMBITTS uses Py Torch [1]... Llama 3.1 with 8.0B parameters... accessed via Ollama [42]. |
| Experiment Setup | Yes | Each network was single-layer feedforward model with 64 hidden units and Re LU activation. Online training was performed with a batch size of 100 and a learning rate of 0.1. The ensembles maintained a replay buffer of size 1,024. To approximate the expectation in Algorithm 4, we used 100 Monte Carlo samples. In each experiment, ens-po GAMBITTS training began after a burn-in period of t = 100 steps, during which the neural networks did not update, allowing sufficient data to accumulate for batch-based optimization. |