Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Analysis of Langevin Monte Carlo via Convex Optimization

Authors: Alain Durmus, Szymon Majewski, Błażej Miasojedow

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Numerical Experiments In this section, we experiment SPGLD and SSGLD on a Bayesian logistic regression problem, see e.g. (Holmes and Held, 2006; Gramacy and Polson, 2012; Park and Hastie, 2007). Consider i.i.d. observations (Xi, Yi)i {1,...,N}, where (Yi)i {1,...,N} are binary response variables and (Xi)i {1,...,N} are d-dimensional covariance variables. ... We approximate p1( |(X, Y )i {1,...,N}) using SPGLD and SSGLD, since the associated potential is Lipschitz, whereas regarding p1,2( |(X, Y )i {1,...,N}) we only apply SPGLD.
Researcher Affiliation Academia Alain Durmus EMAIL CMLA Ecole normale sup erieure Paris-Saclay, CNRS, Universit e Paris-Saclay, 94235 Cachan, France Szymon Majewski EMAIL Institute of Mathematics, Polish Academy of Sciences ul. Sniadeckich 8, 00-656 Warszawa, Poland B la zej Miasojedow EMAIL Institute of Applied Mathematics and Mechanics, University of Warsaw, ul. Banacha 2, 02-097 Warszawa, Poland
Pseudocode Yes Algorithm 1: SSGLD Data: initial distribution µ0 P2(Rd), non-increasing sequence (γk)k 1, U, Θ, η satisfying A3 Result: ( Xk)k N begin Draw X0 µ0 ; for k 0 do Draw Gk+1 N(0, Id) and Zk+1 η ; Set Xk+1 = Xk γk+1Θ( Xk, Zn+1) + p 2γn+2Gn+1 ... Algorithm 2: SPGLD Data: initial distribution µ0 P2(Rd), non-increasing sequence (γk)k 1, U = U1 + U2, Θ1, η1 satisfying A5 Result: ( Xk)k N begin Draw X0 µ0; for k 1 do Draw Gk+1 N(0, Id) and Zk+1 η1 ; Set Xk+1 = proxγk+1 U2 ( Xk) γk+2 Θ1(proxγk+1 U2 ( Xk), Zk+1) + 2γk+2Gk
Open Source Code No The paper does not explicitly provide a link to source code or state that code will be made publicly available. It only describes the methodology and presents numerical experiments.
Open Datasets Yes We consider the three data sets from UCI repository (Dua and Efi, 2017) Heart disease dataset (N = 270, d = 14), Australian Credit Approval dataset (N = 690, d = 34) and Musk dataset (N = 476, d = 166).
Dataset Splits No The paper states: 'Consider i.i.d. observations (Xi, Yi)i {1,...,N}' and mentions estimating 'posterior mean I1 and I2 of the test functions'. However, it does not provide specific details on how the datasets (Heart disease, Australian Credit Approval, Musk) were split into training, validation, or test sets, nor does it refer to standard predefined splits for these particular experiments.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory specifications) used to run the numerical experiments.
Software Dependencies No The paper does not provide specific software dependencies, such as libraries or solvers with their version numbers, that were used to replicate the experiments.
Experiment Setup Yes For our experiments, we use constant stepsizes γ of the form τ(L + m) 1 with τ = 0.01, 0.1, 1 and for stochastic (sub) gradient we use N = N, N/10 , N/100 . For all datasets and all settings of τ, N we run 100 independent runs of SPGLD (SSGLD), where each run was of length 106. For each set of parameters we estimate I1, I2 and we compute the absolute errors, where the true value were obtained by prox-MALA (see Pereyra (2015)) with 107 iterations and stepsize corresponding to optimal acceptance ratio 0.5, see (Roberts and Rosenthal, 1998).