PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits

Authors: Bianca Dumitrascu, Karen Feng, Barbara Engelhardt

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate and compare our PG-TS method with Laplace-TS. We evaluate our algorithm in three scenarios: simulated data sets with parameters sampled from Gaussian and mixed Gaussian distributions, a toy data set based on the Forest Cover Type data set from the UCI repository, and an offline evaluation method for bandit algorithms that relies on real-world log data.
Researcher Affiliation Academia Bianca Dumitrascu Lewis Sigler Institute for Integrative Genomics Princeton University Princeton, NJ 08540 biancad@princeton.edu Karen Feng Department of Computer Science Princeton University Princeton, NJ 08540 karenfeng@princeton.edu Barbara E Engelhardt Department of Computer Science Princeton University Princeton, NJ 08540 bee@princeton.edu
Pseudocode Yes Algorithm 1 PG-TS
Open Source Code No The paper does not contain any explicit statement about releasing source code or provide a link to a code repository for the methodology described.
Open Datasets Yes We further compared these methods using the Forest Cover Type data from the UCI Machine Learning repository [8].
Dataset Splits No The paper describes the sequential processing of data for online learning in a bandit setting and mentions the number of trials or events used, but it does not specify traditional train/validation/test dataset splits as commonly found in supervised learning.
Hardware Specification No The paper does not provide any specific hardware details such as GPU or CPU models, memory, or cloud instance types used for running the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or specific solvers).
Experiment Setup Yes We sample from the PG distribution [24, 27] including M = 100 burn-in steps. This number is empirically tuned... We set the hyperparameters b = 0, and B = I10.