Gradient Descent Averaging and Primal-dual Averaging for Strongly Convex Optimization
Authors: Wei Tao, Wei Li, Zhisong Pan, Qing Tao9843-9850
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Several experiments on SVMs and deep learning models validate the correctness of theoretical analysis and effectiveness of algorithms. |
| Researcher Affiliation | Academia | Wei Tao1,2, Wei Li2, Zhisong Pan2, , Qing Tao3,4, 1Institute of Evaluation and Assessment Research, Academy of Military Science, Beijing 100091, China 2Command and Control Engineering College, Army Engineering University, Nanjing 210007, China 3Department of Information Engineering, Army Academy of Artillery and Air Defense, Hefei 230031, China 4Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China wtao plaust@163.com, liwei public@qq.com, hotpzs@hotmail.com, qing.tao@ia.ac.cn |
| Pseudocode | Yes | Algorithm 1 GDA |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-source code of their methodology. |
| Open Datasets | Yes | We choose four benchmark datasets: a9a, w8a, covtype, ijcnn1 with different scale and dimension, which are publicly available at Lib SVM2 website. CIFAR-10 and CIFAR-100 datasets3. |
| Dataset Splits | No | The paper mentions 'training loss' and 'test accuracy' but does not provide specific details on dataset split percentages for training, validation, and testing, nor does it specify any cross-validation setup. |
| Hardware Specification | Yes | we conduct experiments on a server with 2 NVIDIA 2080Ti GPUs. |
| Software Dependencies | No | The paper does not provide specific software dependencies or library version numbers used for the experiments. |
| Experiment Setup | Yes | We first design a simple 4-layer CNN architecture that consists two convolutional layers (32 filters of size 3 3), one max-pooling layer (2 2 window and 0.25 dropout) and one fully connected layer (128 hidden units and 0.5 dropout). We also use weight decay with a regularization parameter of 5e-3. The loss function is the cross-entropy. To conduct a fair comparison, the constant learning rate is tuned in {0.1; 0.01; 0.001; 0.0001}, and the best results are reported. |