Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Non-stationary Online Learning with Memory and Non-stochastic Control
Authors: Peng Zhao, Yu-Hu Yan, Yu-Xiang Wang, Zhi-Hua Zhou
JMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Although our paper mainly focuses on the theoretical investigation, in this section, we further present empirical studies to support our theoretical findings. We report the results of OCO with memory in Section 6.1 and online non-stochastic control in Section 6.2. [...] Figure 1 plots performance comparisons of three algorithms (OGD, Ader, Scream) under different regularizer coefficients. [...] Figure 2 plots the performance comparison of three algorithms (OGD, Ader, Scream) in terms of the cumulative cost. The result shows that our proposed algorithm outperforms the other two contenders, which validates that the meta-base structure (compared with OGD) and the switching-cost-regularizer (compared with Ader) are necessary for online non-stochastic control problems in non-stationary environments. |
| Researcher Affiliation | Academia | Peng Zhao EMAIL National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China Yu-Hu Yan EMAIL National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China Yu-Xiang Wang EMAIL Department of Computer Science University of California, Santa Barbara, CA 93106, USA Zhi-Hua Zhou EMAIL National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China |
| Pseudocode | Yes | Algorithm 1 Scream Algorithm 2 Lazy Scream Algorithm 3 Scream.Control Algorithm 4 System Identification via Random Inputs (Hazan et al., 2020) |
| Open Source Code | No | The paper does not contain an unambiguous statement of code release, nor does it provide a link to a code repository. The text "All the proofs are included in appendices." and licensing information do not refer to source code for the methodology. |
| Open Datasets | No | The paper describes generating data for experiments: "The data item of each round is denoted by (xt, yt) X Y". It also mentions using "synthetic linear dynamical system (LDS) environments and a real inverted pendulum environment". While the inverted pendulum is a known control problem, the paper describes its environment setup rather than referencing a publicly available dataset of pendulum data for download. No specific links, DOIs, repositories, or formal citations are provided for any publicly available datasets. |
| Dataset Splits | No | The paper describes a simulated online learning scenario where data is generated dynamically: "The data item of each round is denoted by (xt, yt) X Y", and "The underlying model w t will change every 1000 rounds". For the non-stochastic control, it mentions "synthetic linear dynamical system (LDS) environments" and a "real inverted pendulum environment". These are generative or real-time simulation setups, not fixed datasets with explicit training/test/validation splits described. |
| Hardware Specification | No | The paper describes its experimental settings in Section 6 but does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to conduct these experiments. |
| Software Dependencies | No | The paper describes algorithms and experimental setups but does not list any specific software components (e.g., libraries, frameworks) with version numbers that were used for implementation or experimentation. |
| Experiment Setup | Yes | The time horizon is set as T = 50000 and the dimension is set as d = 10. [...] The underlying model w t will change every 1000 rounds, randomly sampled from a d-dimensional ball with diameter D/2, so there are in total S = 50 changes. We the squared loss as loss functions, defined as ft(w) = 1/2(w xt yt)2 and thus the gradient is ft(w) = (w xt yt) xt. The feasible set W is also set as d-dimensional ball with diameter D/2, and thus from all above settings, we know that xt 2 Γ, w 2 D/2, and ft(w) 2 DΓ2. We set Γ = 1 and D = 2, so the gradient norm is upper bounded by G = DΓ2 = 2. [...] We set the regularizer coefficient λ = αG, where G is the gradient norm upper bound. We consider three cases with different regularizer coefficients that impose different levels of penalty on the switching cost: (i) small regularizer (α = 0.1); (ii) medium regularizer (α = 1); (iii) large regularizer (α = 2). We repeat the experiments five times and report the mean and standard deviation of different algorithms with respect to three performance measures (overall loss, cumulative loss, and switching cost). |