Data-Efficient Reinforcement Learning for Malaria Control
Authors: Lixin Zou
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | extensive experimental results verify its advantage over the state-of-the-art on the challenging malaria control task. |
| Researcher Affiliation | Industry | Lixin Zou Baidu Inc. zoulixin15@gmail.com |
| Pseudocode | Yes | Algorithm 1: Variance-Bonus Monte Carlo Tree Search |
| Open Source Code | Yes | Our implementation and all baseline codes are available at https://github.com/zoulixin93/ VB MCTS. |
| Open Datasets | Yes | extensive experiments on two different Open Malaria [Smith et al., 2008] based simulators: Seq Dec Challenge and Prove Challenge... These simulating environments are available at https://github.com/IBM/ushiriki-policy-engine-library. |
| Dataset Splits | Yes | 5-fold cross-validation is performed during the updates of the GP world model. Particularly, we use 1-fold for training and 4-fold for validation, which ensures the generalizability of the GP world model over different state-action pairs. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | We build a feature map of 14-dimension for this task... we empirically set the sum of exploration/exploitation parameters as β1 + β2 = 3.5 in the experiments. In MCTS, cpuct is 5, and only the top 50 rewarded child nodes are expanded. The number of iterations does not exceed 100,000. For Gaussian Process, to avoid overfitting problems, 5-fold cross-validation is performed during the updates of the GP world model. Particularly, we use 1-fold for training and 4-fold for validation... |