State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits
Authors: Shuang Wu, Jingyu Zhao, Guangjian Tian, Jun Wang
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we illustrate the effectiveness and investigate the properties of our proposed method with numerical experiments. |
| Researcher Affiliation | Collaboration | 1Huawei Noah s Ark Lab 2The University of Hong Kong 3University College London |
| Pseudocode | Yes | The pseudo code of the algorithm is in the Appendix for reference. |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing the code or a link to a code repository. |
| Open Datasets | No | We simulate RMAB with three policies with both indexable and non-indexable arms. We consider three cases: N = 10 with M = 3, N = 20 with M = 6, and N = 50 with M = 15. For each case, every policy is simulated with a fixed but randomly generated 500 initial states. |
| Dataset Splits | No | The paper uses a 'mini-batch SGD training approach' but does not explicitly describe training/validation/test splits, only simulating with initial states for evaluation. |
| Hardware Specification | No | The paper does not specify the hardware used for running experiments, such as specific GPU or CPU models. |
| Software Dependencies | No | The paper mentions 'Adam' as the optimizer and the 'Transformer encoder' but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We set discount factor γ = 0.9 and S = 10 for all arms. The mini-batch size is set as 128 with 100 training epochs. The optimizer is chosen as the Adam with a 0.001 learning rate. |