Smooth Non-stationary Bandits
Authors: Su Jia, Qian Xie, Nathan Kallus, Peter I. Frazier
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implemented our algorithm with simulations on synthetic data in the one-armed setting. We visualize the regret of the two policies via a log-log plot with time horizon T = 2j where j = 20, 21, . . . , 26; see Figure 2. Theoretically, the slope of a log-log curve should equal the exponent of the cumulative regret. In fact, if the cumulative regret is c T d, then the log-regret is log c + d log T. Our simulation shows that under smooth non-stationarity, the T 3/5-regret policy outperforms the T 2/3-regret policy. Moreover, the log-log curves have slope 0.70 and 0.62 respectively, which are close to their theoretical values. |
| Researcher Affiliation | Academia | 1Cornell University, Ithaca, New York, USA. |
| Pseudocode | Yes | Algorithm 1 Budgeted Exploration Policy BE(B, ) (Section 4.1) and Algorithm 2 BE(B, ) Policy, Two-Armed Case (Section 5) |
| Open Source Code | No | The paper does not provide any link or explicit statement about making its source code publicly available. |
| Open Datasets | No | The paper uses "synthetic data" generated by the authors for simulations, not a publicly available dataset. "We implemented our algorithm with simulations on synthetic data in the one-armed setting." (Section 6) |
| Dataset Splits | No | The paper uses synthetic data for simulations and theoretical analysis of regret bounds, but does not describe explicit train/validation/test splits for a fixed dataset. |
| Hardware Specification | No | No specific hardware (e.g., GPU, CPU models, memory details) used for running the experiments is mentioned in the paper. |
| Software Dependencies | No | No specific software or library names with version numbers are mentioned (e.g., Python, PyTorch, TensorFlow, etc.) that would be necessary to replicate the experiment. |
| Experiment Setup | Yes | We implemented our algorithm with simulations on synthetic data in the one-armed setting. We consider our BE policy where the parameters are chosen to be optimal for the non-smooth and smooth environments respectively. Formally, we consider the policy BE(B, ) where the tuple (B, ) is chosen to be (T 1/3, T 1/3) for non-smooth and (T 2/5, T 1/5) for smooth non-stationary environments. ... Specifically, in each instance, we have r0(t) = A and r1(t) = A sin(2πνt/T +ϕ)+A, where ν U[2.5,5], A N(0.25ν 2, 0.001) and ϕ U[0,2π]. |