Matching in Multi-arm Bandit with Collision
Authors: YiRui Zhang, Siwei Wang, Zhixuan Fang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Result. Figure 1 shows the average regret and the standard deviation of regret over 50 independent runs. From Figure 1, ML-ETC outperforms both Phased-ETC and CA-UCB from an asymptotic view |
| Researcher Affiliation | Collaboration | Yirui Zhang1, Siwei Wang2, Zhixuan Fang1,3 1 IIIS, Tsinghua University 2 Microsoft Research 3 Shanghai Qi Zhi Institute |
| Pseudocode | Yes | Algorithm 1 ML-ETC Algorithm |
| Open Source Code | Yes | 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See supplemental material |
| Open Datasets | No | We choose the time horizon to be T = 2.5 107, arms mean utilities within [0.3, 0.6], and the minimal gap = 0.05. We have tested two cases with 5 agents and 5 arms but different preference and utility. To investigate the quality of the converging stable matching under different algorithms, we choose the arm preferences such that there exist multiple stable matches between agents and arms (see Appendix for the implementation detail). The paper uses synthetically generated data based on specified parameters and does not provide access information for a public dataset. |
| Dataset Splits | No | The paper describes simulation experiments but does not provide specific training/test/validation dataset splits. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9). |
| Experiment Setup | Yes | Setup. We choose the time horizon to be T = 2.5 107, arms mean utilities within [0.3, 0.6], and the minimal gap = 0.05. We have tested two cases with 5 agents and 5 arms but different preference and utility... When ϵ is smaller, the duration of each exploration is shorter. Same as the simulation in [2], we choose ϵ = 0.2 in our simulation... Same as the simulation in [10], we choose the parameter λ of delay probability to be λ = 0.1. |