reproducibilityindex.ai

Optimistic Whittle Index Policy: Online Learning for Restless Bandits

Authors: Kai Wang, Lily Xu, Aparna Taneja, Milind Tambe

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we demonstrate that UCWhittle leverages the structure of RMABs and the Whittle index policy solution to achieve better performance than existing online learning baselines across three domains, including one constructed from a real-world maternal and childcare dataset.
Researcher Affiliation	Collaboration	Kai Wang* 1, Lily Xu* 1, Aparna Taneja2, Milind Tambe1,2 1Harvard University 2Google Research {kaiwang, lily xu}@g.harvard.edu, {aparnataneja, milindtambe}@google.com
Pseudocode	Yes	Algorithm 1: UCWhittle
Open Source Code	Yes	Code available at https://github.com/lily-x/online-rmab
Open Datasets	Yes	We use real, anonymized data of the engagement behavior of 7,656 mothers from a previous RMAB field study (Mate et al. 2022b).
Dataset Splits	No	The paper describes episodic interaction with an RMAB instance and averaging results over random seeds, but does not specify traditional train/validation/test dataset splits.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models or specific machine specifications used for experiments.
Software Dependencies	No	The paper does not specify software dependencies with version numbers.
Experiment Setup	Yes	The per-episode reward is the cumulative discounted reward with discount rate γ = 0.9. We then compute regret by subtracting the reward earned by each algorithm from the reward of the optimal policy. Results are averaged over 30 random seeds and smoothed using exponential smoothing with a weight of 0.9.