reproducibilityindex.ai

Improved Analysis for Bandit Learning in Matching Markets

Authors: Fang Kong, Zilong Wang, Shuai Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we compare our Algorithm 1 (abbreviated as AETGS-E) with baselines ETGS [16], ML-ETC [31] and Phased ETC [4] which also enjoy guarantees for player-optimal stable regret in general decentralized one-to-one markets. All algorithms run for T = 100k rounds and all results are averaged over 50 independent runs. The error bars represent standard errors, which are computed as standard deviations divided by
Researcher Affiliation	Academia	Fang Kong Southern University of Science and Technology kongf@sustech.edu.cn Zilong Wang Shanghai Jiao Tong University wangzilong@sjtu.edu.cn Shuai Li Shanghai Jiao Tong University shuaili8@sjtu.edu.cn
Pseudocode	Yes	Algorithm 1 adaptively explore-then-Gale-Shapley with elimination (from the view of player pi) and Algorithm 2 centralized UCB
Open Source Code	Yes	The codes are uploaded in supplementary material.
Open Datasets	No	The preference rankings for both players and arms are generated as random permutations. The preference gap between any adjacent ranked arms is set as 0.1. The feedback Xi,j(t) for player pi on arm aj at time t is drawn independently from the Gaussian distribution with mean µi,j and variance 1.
Dataset Splits	No	The paper uses a bandit learning setup, where data is generated iteratively, and thus does not involve traditional train/validation/test dataset splits. Performance is evaluated on the cumulative regret over the entire horizon T.
Hardware Specification	No	We do not provide specific information on the computer resources used, as most experiments on multi-armed bandits are lightweight.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	To better illustrate the advantages of our algorithm, especially when N is much smaller than K, we set N = 3 and K = 10. The preference rankings for both players and arms are generated as random permutations. The preference gap between any adjacent ranked arms is set as 0.1. The feedback Xi,j(t) for player pi on arm aj at time t is drawn independently from the Gaussian distribution with mean µi,j and variance 1. All algorithms run for T = 100k rounds and all results are averaged over 50 independent runs.