reproducibilityindex.ai

DeepTOP: Deep Threshold-Optimal Policy for MDPs and RMABs

Authors: Khaled Nakhleh, I-Hong Hou

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulation results show that our policy signiﬁcantly outperforms other reinforcement learning algorithms due to its ability to exploit the monotone property. In addition, we show that the Whittle index, a powerful tool for restless multi-armed bandit problems, is equivalent to the optimal threshold policy for an alternative problem. This observation leads to a simple algorithm that ﬁnds the Whittle index by learning the optimal threshold policy in the alternative problem. Simulation results show that our algorithm learns the Whittle index much faster than several recent studies that learn the Whittle index through indirect means.
Researcher Affiliation	Academia	Khaled Nakhleh I-Hong Hou Electrical and Computer Engineering Department Texas A&M University College Station, TX {khaled.jamal, ihou}@tamu.edu
Pseudocode	Yes	Algorithm 1 Deep Threshold Optimal Policy Training for MDPs (Deep TOP-MDP)
Open Source Code	Yes	All source code can be found in the repository https://github.com/khalednakhleh/deeptop.
Open Datasets	No	The paper describes the construction and extension of various control problems (e.g., EV charging, inventory management, one-dimensional bandits) for simulation. It refers to a problem being 'based on' or 'extended from' previous work but does not provide concrete access information (link, DOI, specific repository, or formal citation for a public dataset) for the specific simulation parameters or dataset instances used for training.
Dataset Splits	No	The paper describes filling an agent's memory with transitions and then evaluating performance over timesteps in simulated environments, which is typical for reinforcement learning. However, it does not explicitly provide information on dataset splits (e.g., percentages or sample counts) for traditional training, validation, and test sets, as its experimental setup is based on continuous interaction with simulated environments rather than static datasets.
Hardware Specification	No	The paper states that hardware details are in Appendix D ('Did you include the total amount of compute and the type of resources used...? [Yes] see Appendix D.'). However, Appendix D is not included in the provided text, so specific hardware details cannot be found.
Software Dependencies	No	The paper mentions that training parameters and hyper-parameters can be found in Appendix D ('Did you specify all the training details...? [Yes] see Appendix D.'). However, Appendix D is not included in the provided text, so specific software dependencies with version numbers cannot be found.
Experiment Setup	No	The paper states that details about the training parameters (which typically include hyperparameters) can be found in Appendix D ('Details about the training parameters can be found in Appendix D.'). However, Appendix D is not included in the provided text, so specific experimental setup details are not present in the main text.