Improved Analysis for Bandit Learning in Matching Markets
Authors: Fang Kong, Zilong Wang, Shuai Li
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we compare our Algorithm 1 (abbreviated as AETGS-E) with baselines ETGS [16], ML-ETC [31] and Phased ETC [4] which also enjoy guarantees for player-optimal stable regret in general decentralized one-to-one markets. All algorithms run for T = 100k rounds and all results are averaged over 50 independent runs. The error bars represent standard errors, which are computed as standard deviations divided by |
| Researcher Affiliation | Academia | Fang Kong Southern University of Science and Technology kongf@sustech.edu.cn Zilong Wang Shanghai Jiao Tong University wangzilong@sjtu.edu.cn Shuai Li Shanghai Jiao Tong University shuaili8@sjtu.edu.cn |
| Pseudocode | Yes | Algorithm 1 adaptively explore-then-Gale-Shapley with elimination (from the view of player pi) and Algorithm 2 centralized UCB |
| Open Source Code | Yes | The codes are uploaded in supplementary material. |
| Open Datasets | No | The preference rankings for both players and arms are generated as random permutations. The preference gap between any adjacent ranked arms is set as 0.1. The feedback Xi,j(t) for player pi on arm aj at time t is drawn independently from the Gaussian distribution with mean µi,j and variance 1. |
| Dataset Splits | No | The paper uses a bandit learning setup, where data is generated iteratively, and thus does not involve traditional train/validation/test dataset splits. Performance is evaluated on the cumulative regret over the entire horizon T. |
| Hardware Specification | No | We do not provide specific information on the computer resources used, as most experiments on multi-armed bandits are lightweight. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | To better illustrate the advantages of our algorithm, especially when N is much smaller than K, we set N = 3 and K = 10. The preference rankings for both players and arms are generated as random permutations. The preference gap between any adjacent ranked arms is set as 0.1. The feedback Xi,j(t) for player pi on arm aj at time t is drawn independently from the Gaussian distribution with mean µi,j and variance 1. All algorithms run for T = 100k rounds and all results are averaged over 50 independent runs. |