Improved Analysis for Bandit Learning in Matching Markets

Authors: Fang Kong, Zilong Wang, Shuai Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we compare our Algorithm 1 (abbreviated as AETGS-E) with baselines ETGS [16], ML-ETC [31] and Phased ETC [4] which also enjoy guarantees for player-optimal stable regret in general decentralized one-to-one markets. All algorithms run for T = 100k rounds and all results are averaged over 50 independent runs. The error bars represent standard errors, which are computed as standard deviations divided by
Researcher Affiliation Academia Fang Kong Southern University of Science and Technology kongf@sustech.edu.cn Zilong Wang Shanghai Jiao Tong University wangzilong@sjtu.edu.cn Shuai Li Shanghai Jiao Tong University shuaili8@sjtu.edu.cn
Pseudocode Yes Algorithm 1 adaptively explore-then-Gale-Shapley with elimination (from the view of player pi) and Algorithm 2 centralized UCB
Open Source Code Yes The codes are uploaded in supplementary material.
Open Datasets No The preference rankings for both players and arms are generated as random permutations. The preference gap between any adjacent ranked arms is set as 0.1. The feedback Xi,j(t) for player pi on arm aj at time t is drawn independently from the Gaussian distribution with mean µi,j and variance 1.
Dataset Splits No The paper uses a bandit learning setup, where data is generated iteratively, and thus does not involve traditional train/validation/test dataset splits. Performance is evaluated on the cumulative regret over the entire horizon T.
Hardware Specification No We do not provide specific information on the computer resources used, as most experiments on multi-armed bandits are lightweight.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes To better illustrate the advantages of our algorithm, especially when N is much smaller than K, we set N = 3 and K = 10. The preference rankings for both players and arms are generated as random permutations. The preference gap between any adjacent ranked arms is set as 0.1. The feedback Xi,j(t) for player pi on arm aj at time t is drawn independently from the Gaussian distribution with mean µi,j and variance 1. All algorithms run for T = 100k rounds and all results are averaged over 50 independent runs.