Thompson Sampling for Bandit Learning in Matching Markets

Authors: Fang Kong, Junming Yin, Shuai Li

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the practical advantages of the TS-type algorithm over the ETC and UCB-type baselines. In this section, we compare the performances of our CA-TS with other related baselines in different environments
Researcher Affiliation Academia Fang Kong1 , Junming Yin2 and Shuai Li1 1John Hopcroft Center for Computer Science, Shanghai Jiao Tong University 2Tepper School of Business, Carnegie Mellon University
Pseudocode Yes Algorithm 1 Conflict-Avoiding TS (CA-TS)
Open Source Code Yes The code is available at https://github.com/fangkongx/TSforMatchingMarkets.
Open Datasets No The paper uses synthetically generated data: 'we construct a market of N = 5 players and K = 5 arms with global preferences...' and 'The feedback Xi,j(t) for player pi on arm aj at time t is drawn independently from Bernoulli(µi,j).' No concrete access information for a publicly available or open dataset is provided.
Dataset Splits No The paper describes an online learning setting where data is generated iteratively. It states, 'In each market, we run all algorithms for T = 100k rounds and all results are averaged over 50 independent runs.' There is no mention of specific training/validation/test dataset splits, as this is not applicable to the problem setup with iteratively collected rewards.
Hardware Specification No The paper does not provide any specific hardware details such as CPU/GPU models, memory, or type of computing environment used for running the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks, or solvers).
Experiment Setup Yes The hyper-parameters of all baselines are set as their original paper with details in the Appendix. We run all algorithms for T = 100k rounds and all results are averaged over 50 independent runs. we set the preference value towards the least favorite arm as 0.1 for all players and test four different choices of reward gap between any two consecutively ranked arms {0.2, 0.15, 0.1, 0.05}.