A Robust Test for the Stationarity Assumption in Sequential Decision Making

Authors: Jitao Wang, Chengchun Shi, Zhenke Wu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive comparative simulations and a real-world interventional mobile health example illustrate the advantages of our method in detecting change points and optimizing long-term rewards in high-dimensional, non-stationary environments.
Researcher Affiliation Academia 1Department of Biostatistics, University of Michigan, Ann Arbor 2Department of Statistics, London School of Economics and Political Science.
Pseudocode Yes Algorithm 1 Proposed testing procedure
Open Source Code Yes Code is available at https://github.com/jtwang95/Double_CUSUM_RL.
Open Datasets Yes In this section, we apply the proposed testing procedure to a real-world mobile health dataset collected from a micro-randomized trial (MRT) aiming at improving the health outcomes of the medical interns in the United States by sending the push notifications through mobile app to induce and maintain healthy behaviors related to physical activity, sleep and mood (Ne Camp et al., 2020).
Dataset Splits No The paper describes a random, even division of subject indices into two disjoint sets I1 and I2 for method computation (cross-fitting), but does not provide explicit train/validation/test dataset splits with percentages or sample counts for model evaluation purposes.
Hardware Specification No No specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) were provided for running the experiments.
Software Dependencies No The paper mentions using neural networks, Gaussian mixture models (GMM), logistic regression (LR), and double deep Q network (double DQN) algorithms, and specifies architectural details and activation functions. However, it does not provide specific version numbers for any of the software libraries or frameworks used (e.g., PyTorch, TensorFlow, scikit-learn).
Experiment Setup Yes To implement the proposed test, the boundary removal parameter ϵ is set to 0.1. 5000 bootstrap samples are generated to compute p-values. ... In the context of continuous state-space MDP with binary actions, H is set to be the class of feed-forward neural networks that contain a single hidden layer with 32 neurons and the sigmoid function as the activation function. ... The loss function is set to be the Gaussian negative log likelihood. ... We set the neural network that used to learn (p[0,t], p[t,T ]) to have two hidden layers with 128 nodes in each layer along with Re LU activation function. The corresponding learning rate is set to 0.001. ... The neural net with structure [32, 64, 128, 64, 32] serves as the backbone of the Q network and the discount factor is set to 0.9.