Statistical inference with implicit SGD: proximal Robbins-Monro vs. Polyak-Ruppert

Authors: Yoonhyung Lee, Sungdong Lee, Joong-Ho Won

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6. Experiments Following Bach & Moulines (2011), we examined the convergence behavior of prox RM and prox PR using two univariate functions: L(θ) = 1/2θ^2 (strongly convex) and L(θ) = 1/4θ^4 (non-strongly convex)... Figs. 1 and 2 plot the squared estimation... Table 2 collects the results. Table 3 summarizes the results.
Researcher Affiliation Collaboration 1Kakao Entertainment Corp. 2Department of Statistics, Seoul National University.
Pseudocode No The paper describes algorithms using mathematical equations, e.g., 'θn = θn-1 - γn ℓ(Zn, θn)', but does not include structured 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No No explicit statement about making code open source or providing a link to a code repository was found.
Open Datasets No We generated Zn = (yn, xn) where yn = x^T nθ + ϵn, xn ~ N(0, Σ), and ϵn ~ N(0, 1)... we instead used a smoothed version... and let Z ~ N(0, 1).
Dataset Splits No The paper describes the generation of synthetic data and the number of iterations (e.g., '100 independent runs of n = 10^6 ISGD iterations'), but does not specify explicit training/validation/test dataset splits.
Hardware Specification No No specific hardware details (like GPU or CPU models, memory, or cloud instance types) used for experiments are mentioned in the paper.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) that were used for the experiments.
Experiment Setup Yes We fixed the initial point θ0 = 10 for the quadratic and θ0 = 2 for the quartic function, and observed 100 independent runs of n = 10^6 ISGD iterations for initial step size γ1 {1/5, 1, 5, 20, 100} and exponent γ {1/5, 1/3, 2/5, 1/2, 2/3, 1}. We fixed θ = (1, . . . , 1)^T and ran n = 10^5 iterations of ISGD for γ {0.6, 1.0}, p {5, 20, 100, 200} with θ0 = 0 for each type of Σ. The n = 10^6 iterations were started with θ0 = 0 for each replication, where γ {0.6, 1} and µ {10^-1, 10^-2, 10^-3}; we used γ1 = 250 when γ = 1 and γ1 = 30 when γ = 0.6.