Matching in Multi-arm Bandit with Collision

Authors: YiRui Zhang, Siwei Wang, Zhixuan Fang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Result. Figure 1 shows the average regret and the standard deviation of regret over 50 independent runs. From Figure 1, ML-ETC outperforms both Phased-ETC and CA-UCB from an asymptotic view
Researcher Affiliation Collaboration Yirui Zhang1, Siwei Wang2, Zhixuan Fang1,3 1 IIIS, Tsinghua University 2 Microsoft Research 3 Shanghai Qi Zhi Institute
Pseudocode Yes Algorithm 1 ML-ETC Algorithm
Open Source Code Yes 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See supplemental material
Open Datasets No We choose the time horizon to be T = 2.5 107, arms mean utilities within [0.3, 0.6], and the minimal gap = 0.05. We have tested two cases with 5 agents and 5 arms but different preference and utility. To investigate the quality of the converging stable matching under different algorithms, we choose the arm preferences such that there exist multiple stable matches between agents and arms (see Appendix for the implementation detail). The paper uses synthetically generated data based on specified parameters and does not provide access information for a public dataset.
Dataset Splits No The paper describes simulation experiments but does not provide specific training/test/validation dataset splits.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup Yes Setup. We choose the time horizon to be T = 2.5 107, arms mean utilities within [0.3, 0.6], and the minimal gap = 0.05. We have tested two cases with 5 agents and 5 arms but different preference and utility... When ϵ is smaller, the duration of each exploration is shorter. Same as the simulation in [2], we choose ϵ = 0.2 in our simulation... Same as the simulation in [10], we choose the parameter λ of delay probability to be λ = 0.1.