Parallel Asynchronous Stochastic Variance Reduction for Nonconvex Optimization
Authors: Cong Fang, Zhouchen Lin
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also conduct experiments on a shared memory multi-core system to demonstrate the efficiency of our algorithm. In this section, we conduct experiments on a shared memory multi-core system to validate the efficiency of our algorithm empirically. |
| Researcher Affiliation | Academia | Key Laboratory of Machine Perception (MOE), School of EECS, Peking University, P. R. China Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, P. R. China |
| Pseudocode | Yes | Algorithm 1 Serial SVRG; Algorithm 2 ASVRG |
| Open Source Code | No | The paper does not contain any explicit statement or link to open-source code for the described methodology. The supplementary material link is general and does not specify code availability. |
| Open Datasets | Yes | We experiment on two dataset: MNIST dataset3 and CIFAR10 dataset (Krizhevsky and Hinton 2009). 3http://yann.lecun.com/exdb/mnist/ |
| Dataset Splits | No | The paper does not provide specific training/test/validation dataset splits (e.g., percentages or counts). It mentions using MNIST and CIFAR10 but not how they were partitioned for training or validation. |
| Hardware Specification | Yes | All the experiments are performed on an Intel multi-core 4-socket machine with 128 GB memory. Each socket is associated with 8 computation cores. |
| Software Dependencies | No | We implement all methods in C++ using POSIX threads as the parallel programming framework. The paper mentions software components but does not provide specific version numbers for C++ compiler, POSIX threads, or any libraries, which are necessary for reproducibility. |
| Experiment Setup | Yes | For SVRG, we choose a fixed step size, and choose γ that gives the best performance on one core. When there are more than one core, the step size does not change. For SGD, the step size is chosen based on (Reddi et al. 2016), which is γt = γ0(1+γ t/n ) 1, where γ0 and γ are chosen to give the best performance. We use the normalized initialization in (Glorot and Bengio 2010), (Reddi et al. 2016). The parameters are chosen uniformly from [ 6/(ni + no), 6/(ni + no)], where ni and no are the numbers of input and output layers of the neural networks, respectively. We choose a mini-batch size to be 100, which is a common setting in training neural networks. For ASGD, we choose the mini-batch size to be 50, and the step size to be 10 4, which we find is better than the setting used in (Lian et al. 2015). |