(Almost) Free Incentivized Exploration from Decentralized Learning Agents
Authors: Chengshuai Shi, Haifeng Xu, Wei Xiong, Cong Shen
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results are provided to complement the theoretical analysis. Numerical experiments have been carried out to evaluate OTI. All the results are averaged over 100 runs of horizon T = 105 and the agents perform the α-UCB algorithm specified in Section 5.1 with α = 2. |
| Researcher Affiliation | Academia | Chengshuai Shi University of Virginia cs7ync@virginia.edu Haifeng Xu University of Virginia hx4ad@virginia.edu Wei Xiong The Hong Kong University of Science and Technology wxiongae@connect.ust.hk Cong Shen University of Virginia cong@virginia.edu |
| Pseudocode | Yes | Algorithm 1 OTI: Principal |
| Open Source Code | No | The paper does not provide any links to open-source code or explicitly state that code for the methodology is released. |
| Open Datasets | No | The paper describes experiments run on simulated environments (e.g., 'toy example of M = 2 agents and K = 3 arms', 'random local instances with 30 arms are generated') rather than a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper discusses simulation parameters but does not specify explicit training, validation, or test dataset splits in the conventional sense for a supervised learning problem. The environment is simulated for multi-armed bandit scenarios. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions agents running the 'α-UCB algorithm', but does not specify any software names with version numbers used for implementation (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | All the results are averaged over 100 runs of horizon T = 105 and the agents perform the α-UCB algorithm specified in Section 5.1 with α = 2. First, with a toy example of M = 2 agents and K = 3 arms, the ineffectiveness of not incentivizing is illustrated. Under different M, random local instances with 30 arms are generated to compose global instances with min [4.5, 5.5] 10 3. |