Optimistic Whittle Index Policy: Online Learning for Restless Bandits
Authors: Kai Wang, Lily Xu, Aparna Taneja, Milind Tambe
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we demonstrate that UCWhittle leverages the structure of RMABs and the Whittle index policy solution to achieve better performance than existing online learning baselines across three domains, including one constructed from a real-world maternal and childcare dataset. |
| Researcher Affiliation | Collaboration | Kai Wang* 1, Lily Xu* 1, Aparna Taneja2, Milind Tambe1,2 1Harvard University 2Google Research {kaiwang, lily xu}@g.harvard.edu, {aparnataneja, milindtambe}@google.com |
| Pseudocode | Yes | Algorithm 1: UCWhittle |
| Open Source Code | Yes | Code available at https://github.com/lily-x/online-rmab |
| Open Datasets | Yes | We use real, anonymized data of the engagement behavior of 7,656 mothers from a previous RMAB field study (Mate et al. 2022b). |
| Dataset Splits | No | The paper describes episodic interaction with an RMAB instance and averaging results over random seeds, but does not specify traditional train/validation/test dataset splits. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models or specific machine specifications used for experiments. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers. |
| Experiment Setup | Yes | The per-episode reward is the cumulative discounted reward with discount rate γ = 0.9. We then compute regret by subtracting the reward earned by each algorithm from the reward of the optimal policy. Results are averaged over 30 random seeds and smoothed using exponential smoothing with a weight of 0.9. |