reproducibilityindex.ai

State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits

Authors: Shuang Wu, Jingyu Zhao, Guangjian Tian, Jun Wang

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we illustrate the effectiveness and investigate the properties of our proposed method with numerical experiments.
Researcher Affiliation	Collaboration	1Huawei Noah s Ark Lab 2The University of Hong Kong 3University College London
Pseudocode	Yes	The pseudo code of the algorithm is in the Appendix for reference.
Open Source Code	No	The paper does not provide an explicit statement about open-sourcing the code or a link to a code repository.
Open Datasets	No	We simulate RMAB with three policies with both indexable and non-indexable arms. We consider three cases: N = 10 with M = 3, N = 20 with M = 6, and N = 50 with M = 15. For each case, every policy is simulated with a ﬁxed but randomly generated 500 initial states.
Dataset Splits	No	The paper uses a 'mini-batch SGD training approach' but does not explicitly describe training/validation/test splits, only simulating with initial states for evaluation.
Hardware Specification	No	The paper does not specify the hardware used for running experiments, such as specific GPU or CPU models.
Software Dependencies	No	The paper mentions 'Adam' as the optimizer and the 'Transformer encoder' but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We set discount factor γ = 0.9 and S = 10 for all arms. The mini-batch size is set as 128 with 100 training epochs. The optimizer is chosen as the Adam with a 0.001 learning rate.