Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Online Bandit Nonlinear Control with Dynamic Batch Length and Adaptive Learning Rate

Authors: Jihun Kim, Javad Lavaei

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 5, we present numerical experiments on the DBAR algorithm with an ablation study on batch length and learning rate. To demonstrate the main results of this paper, we provide illustrative examples on both linear and nonlinear dynamics with adversarial disturbances. Figures 2, 3, 4, 5, 6, and 7 present results on stability analysis, regret analysis, and ablation studies, which are all characteristics of experimental research.
Researcher Affiliation Academia Jihun Kim EMAIL Department of Industrial Engineering and Operations Research University of California, Berkeley. Javad Lavaei EMAIL Department of Industrial Engineering and Operations Research University of California, Berkeley. Both authors are affiliated with the University of California, Berkeley, an academic institution, and have .edu email addresses.
Pseudocode Yes Algorithm 1 DBAR, Algorithm 2 DBAR-unknown |U|, and Algorithm 3 DBAR-switching are explicitly provided in the paper, detailing the procedural steps of the proposed methods.
Open Source Code No The paper does not contain any explicit statement about releasing source code, a link to a code repository, or mention of code provided in supplementary materials for the methodology described.
Open Datasets No The paper describes experiments on 'linear and nonlinear dynamics' using simulated systems (e.g., 'randomly generated from Uniform[-1, 1]' for matrices, 'sinusoidal noise' for disturbances). It does not use or provide access to any external public datasets.
Dataset Splits No The paper uses simulated systems with parameters defined within the text (e.g., 'xt+1 = A xt + B ut + wt' for linear, 'leader-follower system' for nonlinear). As such, there are no traditional external datasets that would require specific training/test/validation splits for reproduction.
Hardware Specification Yes Apple M1 Chip with 8-Core CPU is sufficient for the experiments.
Software Dependencies No The paper mentions 'forward-Euler discretization' but does not specify any software names with version numbers (e.g., programming languages, libraries, or scientific computing environments) that would be necessary for replication.
Experiment Setup Yes For all the experiments implementing the algorithm, we use T = 1200, η0 = 0.025, γ = 2.5, α0 = 1.01, and x0 = [100, 200] 15. For the dynamic batch length, we consider τ0 = 7 and τb = τ0 ( b+10 / 10 )0.5. It is well known that every (asymptotically) stabilizing controller in the linear system is indeed exponentially stabilizing controller (Khalil, 2015). Hence, we use β(t) = 0.99t without relaxing the assumptions on stabilizing controllers. Finally, we use δ = γwmax / (1 − β(τ0)). For the experiments implementing the algorithm, we use T = 5000, η0 = 0.25, γ = 1.5, α0 = 1.01, y0 = [−32, 24, 5.6, 24], and z0 = [10, 10, . . . ] R96. For the dynamic batch length, we consider τ0 = 9 and τb = τ0 ( b+41 / 41 )0.5. To deeply study this notion, we consider different polynomially decreasing series (which is not exponentially decreasing) to be the candidates for β(t): β1(t) = min{10/t1.05, 1}, β2(t) = min{10/t1.08, 1}. The controller pool is defined by specific parameter ranges: p {2, 16, 30, 44, 58, 72, 86, 100}, k1 {2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5}, k2 {1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5}.