Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Thompson Sampling for Multi-Objective Linear Contextual Bandit

Authors: Somangchan Park, Heesang Ann, Min-hwan Oh

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results confirm the benefits of our proposed approach, demonstrating improved regret minimization and strong multi-objective performance. Numerical experiments demonstrate the effectiveness of our proposed approach, showing improved performance in regret minimization, and objective-wise total reward maximization.
Researcher Affiliation	Academia	Somangchan Park Seoul National University EMAIL Heesang Ann Seoul National University EMAIL Min-hwan Oh Seoul National University EMAIL
Pseudocode	Yes	Algorithm 1: Multi-Objective Linear TS (MOL-TS)
Open Source Code	Yes	All codes of algorithms and experiments are provided in a ZIP file. Experimental results are provided in Section 6 and Appendix F.
Open Datasets	No	Each experimental setup contains 10 different instances with fixed number of arms K, objectives L, and feature dimension d. We demonstrate the case where K = 50, d = 5, L = 4. The parameter vector for each objective θ(ℓ) has a norm of 1. Each round, d-dimensional context vectors are revealed for every arm, bounded by 1 in Euclidean norm. Upon playing an arm, the agent receives a reward vector with an additional noise term, where the noise values are sampled from a zero mean Gaussian distribution with σ = 1.
Dataset Splits	No	The paper describes generating synthetic data for experiments and running for T=10000 rounds. It does not provide explicit training/test/validation dataset splits, which is common in contextual bandit settings where learning is sequential.
Hardware Specification	Yes	All experiments are conducted with INTEL(R) XEON(R) GOLD 6526Y CPU and 4 TB memory.
Software Dependencies	Yes	The software environment includes Python 3.12.7, Scipy 1.14.1, and Numpy 1.26.4.
Experiment Setup	Yes	We measure the Pareto regret and effective Pareto regret over T = 10000 rounds. Each experimental setup contains 10 different instances with fixed number of arms K, objectives L, and feature dimension d. We demonstrate the case where K = 50, d = 5, L = 4. The parameter vector for each objective θ(ℓ) has a norm of 1. Each round, d-dimensional context vectors are revealed for every arm, bounded by 1 in Euclidean norm. Upon playing an arm, the agent receives a reward vector with an additional noise term, where the noise values are sampled from a zero mean Gaussian distribution with σ = 1. MOL-ϵ-Greedy is basic MOMAB algorithm with ϵ = 0.05.