Adaptive Variance Reducing for Stochastic Gradient Descent
Authors: Zebang Shen, Hui Qian, Tengfei Zhou, Tongzhou Mu
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present results of several numerical experiments to validate our theoretical analyses of Ada SVRG and Ada SAGA and to show the empirical efficiency of HVRG. Experiments on l2-Logistic Regression, l1l2-Logistic Regression, and Ridge Regression are conducted. We use datasets from LIBSVM [Chang and Lin, 2011] and list their statistics in Table 1. |
| Researcher Affiliation | Academia | Zhejiang University, China {shenzebang, qianhui, zhoutengfei zju, mutongzhou}@zju.edu.cn |
| Pseudocode | Yes | Algorithm 1 Ada SVRG |
| Open Source Code | No | The paper does not contain an explicit statement about the release of its source code or provide a link to a code repository. |
| Open Datasets | Yes | We use datasets from LIBSVM [Chang and Lin, 2011] and list their statistics in Table 1. The rcv1 dataset [Lewis et al., 2004] is used to test the performance of HVRG in l1l2-Logistic Regression |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits. It discusses "train" and "test" in the context of general machine learning concepts, but doesn't specify how data was partitioned for validation purposes. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., CPU, GPU models, memory, or cloud resources) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9') that were used for implementation or experimentation. |
| Experiment Setup | Yes | The parameter m in SVRG, IProx-SVRG, and Ada SVRG is set to 2n uniformly, as suggested in [Xiao and Zhang, 2014]. We tune the step size (typically from 1/4L to 1/L) for different methods so that they give the best performance. The parameters c and in HVRG are fixed to 5 and 1.5 respectively. As for initialization, w0 is set to zero in all experiments. |