reproducibilityindex.ai

Thompson Sampling for Bandit Learning in Matching Markets

Authors: Fang Kong, Junming Yin, Shuai Li

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the practical advantages of the TS-type algorithm over the ETC and UCB-type baselines. In this section, we compare the performances of our CA-TS with other related baselines in different environments
Researcher Affiliation	Academia	Fang Kong1 , Junming Yin2 and Shuai Li1 1John Hopcroft Center for Computer Science, Shanghai Jiao Tong University 2Tepper School of Business, Carnegie Mellon University
Pseudocode	Yes	Algorithm 1 Conﬂict-Avoiding TS (CA-TS)
Open Source Code	Yes	The code is available at https://github.com/fangkongx/TSforMatchingMarkets.
Open Datasets	No	The paper uses synthetically generated data: 'we construct a market of N = 5 players and K = 5 arms with global preferences...' and 'The feedback Xi,j(t) for player pi on arm aj at time t is drawn independently from Bernoulli(µi,j).' No concrete access information for a publicly available or open dataset is provided.
Dataset Splits	No	The paper describes an online learning setting where data is generated iteratively. It states, 'In each market, we run all algorithms for T = 100k rounds and all results are averaged over 50 independent runs.' There is no mention of specific training/validation/test dataset splits, as this is not applicable to the problem setup with iteratively collected rewards.
Hardware Specification	No	The paper does not provide any specific hardware details such as CPU/GPU models, memory, or type of computing environment used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks, or solvers).
Experiment Setup	Yes	The hyper-parameters of all baselines are set as their original paper with details in the Appendix. We run all algorithms for T = 100k rounds and all results are averaged over 50 independent runs. we set the preference value towards the least favorite arm as 0.1 for all players and test four different choices of reward gap between any two consecutively ranked arms {0.2, 0.15, 0.1, 0.05}.