Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Thompson Sampling for Multi-Objective Linear Contextual Bandit
Authors: Somangchan Park, Heesang Ann, Min-hwan Oh
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results confirm the benefits of our proposed approach, demonstrating improved regret minimization and strong multi-objective performance. Numerical experiments demonstrate the effectiveness of our proposed approach, showing improved performance in regret minimization, and objective-wise total reward maximization. |
| Researcher Affiliation | Academia | Somangchan Park Seoul National University EMAIL Heesang Ann Seoul National University EMAIL Min-hwan Oh Seoul National University EMAIL |
| Pseudocode | Yes | Algorithm 1: Multi-Objective Linear TS (MOL-TS) |
| Open Source Code | Yes | All codes of algorithms and experiments are provided in a ZIP file. Experimental results are provided in Section 6 and Appendix F. |
| Open Datasets | No | Each experimental setup contains 10 different instances with fixed number of arms K, objectives L, and feature dimension d. We demonstrate the case where K = 50, d = 5, L = 4. The parameter vector for each objective θ(ℓ) has a norm of 1. Each round, d-dimensional context vectors are revealed for every arm, bounded by 1 in Euclidean norm. Upon playing an arm, the agent receives a reward vector with an additional noise term, where the noise values are sampled from a zero mean Gaussian distribution with σ = 1. |
| Dataset Splits | No | The paper describes generating synthetic data for experiments and running for T=10000 rounds. It does not provide explicit training/test/validation dataset splits, which is common in contextual bandit settings where learning is sequential. |
| Hardware Specification | Yes | All experiments are conducted with INTEL(R) XEON(R) GOLD 6526Y CPU and 4 TB memory. |
| Software Dependencies | Yes | The software environment includes Python 3.12.7, Scipy 1.14.1, and Numpy 1.26.4. |
| Experiment Setup | Yes | We measure the Pareto regret and effective Pareto regret over T = 10000 rounds. Each experimental setup contains 10 different instances with fixed number of arms K, objectives L, and feature dimension d. We demonstrate the case where K = 50, d = 5, L = 4. The parameter vector for each objective θ(ℓ) has a norm of 1. Each round, d-dimensional context vectors are revealed for every arm, bounded by 1 in Euclidean norm. Upon playing an arm, the agent receives a reward vector with an additional noise term, where the noise values are sampled from a zero mean Gaussian distribution with σ = 1. MOL-ϵ-Greedy is basic MOMAB algorithm with ϵ = 0.05. |