On Markov Chain Gradient Descent

Authors: Tao Sun, Yuejiao Sun, Wotao Yin

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present two kinds of numerical results. The first one is to show that MCGD uses fewer samples to train both a convex model and a nonconvex model. The second one demonstrates the advantage of the faster mixing of a non-reversible Markov chain. Our results on nonconvex objective and non-reversible chains are new.
Researcher Affiliation Academia Tao Sun College of Computer National University of Defense Technology Changsha, Hunan 410073, China nudtsuntao@163.com Yuejiao Sun Department of Mathematics University of California, Los Angeles Los Angeles, CA 90095, USA sunyj@math.ucla.edu Wotao Yin Department of Mathematics University of California, Los Angeles Los Angeles, CA 90095, USA wotaoyin@math.ucla.edu
Pseudocode No The paper describes algorithms mathematically using equations (e.g., (2), (3), (5)), but does not present them in a structured pseudocode or algorithm block.
Open Source Code No The paper does not provide any statement or link indicating that the source code for their methodology is openly available.
Open Datasets No The paper describes generating data for its experiments (e.g., 'Randomly sample a vector u Rd, d = 50' and 'construct an undirected connected graph with n = 20 nodes with edges randomly generated') rather than using a publicly available dataset with concrete access information or citations.
Dataset Splits No The paper does not explicitly provide details about validation dataset splits. It discusses 'training' models but not a separate 'validation' set.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch x.x).
Experiment Setup Yes We choose γk = 1 kq as our stepsize, where q = 0.501. This choice is consistently with our theory below.