Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The promises and pitfalls of Stochastic Gradient Langevin Dynamics

Authors: Nicolas Brosse, Alain Durmus, Eric Moulines

NeurIPS 2018 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our findings are supported by limited numerical experiments.
Researcher Affiliation Academia Nicolas Brosse, Éric Moulines Centre de Mathématiques Appliquées, UMR 7641, Ecole Polytechnique, Palaiseau, France. EMAIL, EMAIL Alain Durmus Ecole Normale Supérieure CMLA, 61 Av. du Président Wilson 94235 Cachan Cedex, France. EMAIL
Pseudocode No The paper provides mathematical formulations of algorithms (e.g., equations 2, 3, 4, 5) but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets Yes We then illustrate our results on the covertype dataset1 with a Bayesian logistic regression model. The prior is a standard multivariate Gaussian distribution. Given the size of the dataset and the dimension of the problem, LMC requires high computational resources and is not included in the simulations. We truncate the training dataset at N 10^3, 10^4, 10^5 . For all algorithms, the step size γ is set equal to 1/N and the trajectories are started at ˆθ, an estimator of θ , computed using SGD combined with the BFGS algorithm. 1https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/covtype.libsvm. binary.scale.bz2
Dataset Splits No The paper mentions truncating the 'training dataset' and evaluating on the 'test dataset' for the Covertype dataset, but does not provide specific train/validation/test splits (e.g., percentages or sample counts for a validation set) or cross-validation details.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions software like 'Sci Py' [19] and 'Scikit-learn' [28] but does not specify version numbers for these or other key software components used in the experiments.
Experiment Setup Yes For the LMC, SGLDFP, SGLD and SGD algorithms, the step size γ is set equal to (1 + δ/4) 1 where δ is the largest eigenvalue of XTX. We start the algorithms at θ0 = ˆθ and run n = 1/γ iterations where the first 10% samples are discarded as a burn-in period.