Observe Before Play: Multi-Armed Bandit with Pre-Observations
Authors: Jinhang Zuo, Xiaoxi Zhang, Carlee Joe-Wong7023-7030
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on synthetic data and wireless channel traces show that C-MP-OBP and D-MPOBP outperform random heuristics and offline optimal policies that do not allow pre-observations. Our final contribution is to numerically validate our OBP, C-MP-OBP, and D-MP-OBP policies on synthetic reward data and channel availability traces. |
| Researcher Affiliation | Academia | Jinhang Zuo, Xiaoxi Zhang, Carlee Joe-Wong Carnegie Mellon University {jzuo, xiaoxiz2, cjoewong}@andrew.cmu.edu |
| Pseudocode | Yes | Algorithm 1 Observe-Before-Play UCB (OBP-UCB); Algorithm 2 Centralized Multi-Player OBP (C-MP-OBP); Algorithm 3 Distributed Multi-Player OBP (D-MP-OBP) |
| Open Source Code | No | The paper does not provide an explicit statement about the release of source code for the described methodology, nor does it include a direct link to a code repository. The link provided in the citation (Zuo et al. 2019) is to the paper itself. |
| Open Datasets | Yes | Experiments on synthetic data and wireless channel traces show that C-MP-OBP and D-MPOBP outperform random heuristics and offline optimal policies that do not allow pre-observations. Wang 2018. https://github.com/ANRGUSC/ Multichannel DQN-channel Model. |
| Dataset Splits | No | The paper mentions using 'synthetic data' and 'channel availability traces' but does not specify how these datasets were split into training, validation, or test sets. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with their version numbers (e.g., programming languages, libraries, frameworks, or solvers). |
| Experiment Setup | No | The paper describes the problem parameters (e.g., K arms, cost τ) and experiment duration ('after 5000 rounds'), but it does not specify concrete hyperparameter values for the algorithms (e.g., learning rates, batch sizes, optimizer settings) or other detailed system-level training configurations. |