Statistical inference with implicit SGD: proximal Robbins-Monro vs. Polyak-Ruppert
Authors: Yoonhyung Lee, Sungdong Lee, Joong-Ho Won
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6. Experiments Following Bach & Moulines (2011), we examined the convergence behavior of prox RM and prox PR using two univariate functions: L(θ) = 1/2θ^2 (strongly convex) and L(θ) = 1/4θ^4 (non-strongly convex)... Figs. 1 and 2 plot the squared estimation... Table 2 collects the results. Table 3 summarizes the results. |
| Researcher Affiliation | Collaboration | 1Kakao Entertainment Corp. 2Department of Statistics, Seoul National University. |
| Pseudocode | No | The paper describes algorithms using mathematical equations, e.g., 'θn = θn-1 - γn ℓ(Zn, θn)', but does not include structured 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | No explicit statement about making code open source or providing a link to a code repository was found. |
| Open Datasets | No | We generated Zn = (yn, xn) where yn = x^T nθ + ϵn, xn ~ N(0, Σ), and ϵn ~ N(0, 1)... we instead used a smoothed version... and let Z ~ N(0, 1). |
| Dataset Splits | No | The paper describes the generation of synthetic data and the number of iterations (e.g., '100 independent runs of n = 10^6 ISGD iterations'), but does not specify explicit training/validation/test dataset splits. |
| Hardware Specification | No | No specific hardware details (like GPU or CPU models, memory, or cloud instance types) used for experiments are mentioned in the paper. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) that were used for the experiments. |
| Experiment Setup | Yes | We fixed the initial point θ0 = 10 for the quadratic and θ0 = 2 for the quartic function, and observed 100 independent runs of n = 10^6 ISGD iterations for initial step size γ1 {1/5, 1, 5, 20, 100} and exponent γ {1/5, 1/3, 2/5, 1/2, 2/3, 1}. We fixed θ = (1, . . . , 1)^T and ran n = 10^5 iterations of ISGD for γ {0.6, 1.0}, p {5, 20, 100, 200} with θ0 = 0 for each type of Σ. The n = 10^6 iterations were started with θ0 = 0 for each replication, where γ {0.6, 1} and µ {10^-1, 10^-2, 10^-3}; we used γ1 = 250 when γ = 1 and γ1 = 30 when γ = 0.6. |