Computing the Bias of Constant-step Stochastic Approximation with Markovian Noise
Authors: Sebastian Allmeier, Nicolas Gast
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study stochastic approximation algorithms with Markovian noise and constant step-size α. We develop a method based on infinitesimal generator comparisons to study the bias of the algorithm, which is the expected difference between θn the value at iteration n and θ the unique equilibrium of the corresponding ODE. Our main contribution is to provide a framework based on semi-groups, to quantify the convergence rate of θn to θ . This framework is similar to Stein s method, that has been recently popularized to obtain accuracy results and refinement terms for fluid limits [10, 17, 18, 40]. It allows us to quantify the distance between the stochastic recurrence (1) and its deterministic counterpart (8) as a function of the distance between the infinitesimal generators of the stochastic and deterministic systems. By using this framework, we obtain two main results. Our first result (Theorem 1) states that, under smoothness conditions, there exists a constant C ą 0 such that for all small enough α: lim sup nÑ8 |E rθns θ | ď Cα and lim sup nÑ8 |E pθn θ q2 | ď Cα. This guarantees that that the bias and variance of θn are of order at most α. |
| Researcher Affiliation | Academia | Sebastian Allmeier Univ. Grenoble Alpes and Inria F-38000 Grenoble, France. Nicolas Gast Univ. Grenoble Alpes and Inria F-38000 Grenoble, France. |
| Pseudocode | No | The paper contains mathematical equations and proofs but no structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | To ensure reproducibility, the code necessary to reproduce the paper (including latex, code to run the simulations and code to plot the figures) is available at https://github.com/ngast/paper_bias_stochastic_approximation2024. |
| Open Datasets | No | This result is illustrated on a synthetic example whose parameters are given in Appendix A.2, and for which θ 1. We run the stochastic approximation algorithm for various values of α and report the results in Figure 1. In all cases, we use the same source of randomness (the values of the Markov chain Xn and of the the Martingale noise Mn 1 are the same for all trajectories). For Figure 1 and 2, the state space of the Markovian part is X t0, 1u, and the transition matrix is Kpθq ˆ sinpθq2 cospθq2 cospθq2 sinpθq2. The drift is fpθ, xq θ 2x. The martingale noise Mn 1 is a sequence of i.i.d. random variables with P p Mn 1 1 | Fnq P p Mn 1 1 | Fnq 0.5. One can show that θ 1. The initial value of the algorithm is set to θ0 3, X0 0. |
| Dataset Splits | No | The paper uses synthetic examples for numerical illustration and does not describe data splits for training, validation, or testing in the typical machine learning sense. |
| Hardware Specification | Yes | The total computation time to obtain all figures of the paper does not exceeds a few minutes (on a 2018 laptop, all implementation being done in Python/Numpy/Matplotlib). |
| Software Dependencies | No | The paper mentions 'Python/Numpy/Matplotlib' but does not specify version numbers for these software components. |
| Experiment Setup | Yes | We run the stochastic approximation algorithm for various values of α and report the results in Figure 1. In all cases, we use the same source of randomness (the values of the Markov chain Xn and of the the Martingale noise Mn 1 are the same for all trajectories). We plot a sample trajectory of θn, θn, and θn{2:n defined2 as: k tn{2u 1 θk, for α P t0.001{4, 0.001{8, 0.001{16u (more values of α are displayed in Appendix A). For Figure 1 and 2, the state space of the Markovian part is X t0, 1u, and the transition matrix is Kpθq ˆ sinpθq2 cospθq2 cospθq2 sinpθq2. The drift is fpθ, xq θ 2x. The martingale noise Mn 1 is a sequence of i.i.d. random variables with P p Mn 1 1 | Fnq P p Mn 1 1 | Fnq 0.5. One can show that θ 1. The initial value of the algorithm is set to θ0 3, X0 0. |