State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits

Authors: Shuang Wu, Jingyu Zhao, Guangjian Tian, Jun Wang

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we illustrate the effectiveness and investigate the properties of our proposed method with numerical experiments.
Researcher Affiliation Collaboration 1Huawei Noah s Ark Lab 2The University of Hong Kong 3University College London
Pseudocode Yes The pseudo code of the algorithm is in the Appendix for reference.
Open Source Code No The paper does not provide an explicit statement about open-sourcing the code or a link to a code repository.
Open Datasets No We simulate RMAB with three policies with both indexable and non-indexable arms. We consider three cases: N = 10 with M = 3, N = 20 with M = 6, and N = 50 with M = 15. For each case, every policy is simulated with a fixed but randomly generated 500 initial states.
Dataset Splits No The paper uses a 'mini-batch SGD training approach' but does not explicitly describe training/validation/test splits, only simulating with initial states for evaluation.
Hardware Specification No The paper does not specify the hardware used for running experiments, such as specific GPU or CPU models.
Software Dependencies No The paper mentions 'Adam' as the optimizer and the 'Transformer encoder' but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We set discount factor γ = 0.9 and S = 10 for all arms. The mini-batch size is set as 128 with 100 training epochs. The optimizer is chosen as the Adam with a 0.001 learning rate.