Mean Field Equilibrium in Multi-Armed Bandit Game with Continuous Reward
Authors: Xiong Wang, Riheng Jia
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluations validate our MFE characterization, and exhibit tight empirical regret of the MAB problem. In this section, we carry out the evaluations where results are smoothed via LOWESS in Python for better exhibition. |
| Researcher Affiliation | Academia | Xiong Wang1 , Riheng Jia2 1The Chinese University of Hong Kong, Hong Kong SAR, China 2Zhejiang Normal University, Jinhua, China xwang@cse.cuhk.edu.hk, rihengjia@zjnu.edu.cn |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link to open-source code for the methodology described. |
| Open Datasets | No | The paper describes a reward function used in the evaluation, which is a mathematical model of agent competition, but it does not specify a publicly available or open dataset (e.g., a named benchmark dataset with a citation or link) for training or evaluation. The reward function and environment appear to be custom-defined for the simulations. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits. The evaluation section describes simulations run for a certain number of time slots, but no data partitioning details are given. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments. |
| Software Dependencies | No | The paper mentions "LOWESS in Python" in the evaluation section, but it does not provide specific version numbers for Python or any other key software components, libraries, or solvers. |
| Experiment Setup | Yes | Reward function. We consider that agents will compete for different resources (arms), so the reward r(fn, j) is a nonlinear decreasing function in fn(j) [Gummadi et al., 2013]: r(fn, j) = 1 1 + θ(j)fn(j), (21) where θ(j) [0.8θ, θ], j M. Hence, r(fn, j) [0, 1] is θ-Lipschitz continuous. Set γn = 1/(n + 1) in Eq. (5). Contraction mapping. Let (θ, β, η) be (0.5, 0.5, 0.2), respectively, that is the contraction condition 4θ(1 η)β < 1 holds. Given the number of arms M = 4, we run the bandit game for four times and display the state evolution of arm 2 in Figure 1. Non-contraction mapping. Let θ, η, M stay the same, while β changes to 30 so the contraction mapping 4θ(1 η)β < 1 is violated. We run the bandit game for four times and depict the state evolution in Figure 3. For the general reward, we compute the regret when contraction mapping holds, i.e., (θ, β, η) = (0.5, 0.5, 0.2). Furthermore, we implement a linear reward: r(fn(j), j) = 1 θ(j)fn(j) where θ(j) [0.8θ, θ], j M and (θ, β, η) = (1, 2, 0.2), so the reward is a contraction from Corollary 1. We run the evaluation for six times with each operating for T = 2000 time slots, and show the average regret and cumulative rewards in Table 1. |