Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Online Bandit Nonlinear Control with Dynamic Batch Length and Adaptive Learning Rate
Authors: Jihun Kim, Javad Lavaei
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 5, we present numerical experiments on the DBAR algorithm with an ablation study on batch length and learning rate. To demonstrate the main results of this paper, we provide illustrative examples on both linear and nonlinear dynamics with adversarial disturbances. Figures 2, 3, 4, 5, 6, and 7 present results on stability analysis, regret analysis, and ablation studies, which are all characteristics of experimental research. |
| Researcher Affiliation | Academia | Jihun Kim EMAIL Department of Industrial Engineering and Operations Research University of California, Berkeley. Javad Lavaei EMAIL Department of Industrial Engineering and Operations Research University of California, Berkeley. Both authors are affiliated with the University of California, Berkeley, an academic institution, and have .edu email addresses. |
| Pseudocode | Yes | Algorithm 1 DBAR, Algorithm 2 DBAR-unknown |U|, and Algorithm 3 DBAR-switching are explicitly provided in the paper, detailing the procedural steps of the proposed methods. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code, a link to a code repository, or mention of code provided in supplementary materials for the methodology described. |
| Open Datasets | No | The paper describes experiments on 'linear and nonlinear dynamics' using simulated systems (e.g., 'randomly generated from Uniform[-1, 1]' for matrices, 'sinusoidal noise' for disturbances). It does not use or provide access to any external public datasets. |
| Dataset Splits | No | The paper uses simulated systems with parameters defined within the text (e.g., 'xt+1 = A xt + B ut + wt' for linear, 'leader-follower system' for nonlinear). As such, there are no traditional external datasets that would require specific training/test/validation splits for reproduction. |
| Hardware Specification | Yes | Apple M1 Chip with 8-Core CPU is sufficient for the experiments. |
| Software Dependencies | No | The paper mentions 'forward-Euler discretization' but does not specify any software names with version numbers (e.g., programming languages, libraries, or scientific computing environments) that would be necessary for replication. |
| Experiment Setup | Yes | For all the experiments implementing the algorithm, we use T = 1200, η0 = 0.025, γ = 2.5, α0 = 1.01, and x0 = [100, 200] 15. For the dynamic batch length, we consider τ0 = 7 and τb = τ0 ( b+10 / 10 )0.5. It is well known that every (asymptotically) stabilizing controller in the linear system is indeed exponentially stabilizing controller (Khalil, 2015). Hence, we use β(t) = 0.99t without relaxing the assumptions on stabilizing controllers. Finally, we use δ = γwmax / (1 − β(τ0)). For the experiments implementing the algorithm, we use T = 5000, η0 = 0.25, γ = 1.5, α0 = 1.01, y0 = [−32, 24, 5.6, 24], and z0 = [10, 10, . . . ] R96. For the dynamic batch length, we consider τ0 = 9 and τb = τ0 ( b+41 / 41 )0.5. To deeply study this notion, we consider different polynomially decreasing series (which is not exponentially decreasing) to be the candidates for β(t): β1(t) = min{10/t1.05, 1}, β2(t) = min{10/t1.08, 1}. The controller pool is defined by specific parameter ranges: p {2, 16, 30, 44, 58, 72, 86, 100}, k1 {2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5}, k2 {1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5}. |