SAdam: A Variant of Adam for Strongly Convex Functions
Authors: Guanghui Wang, Shiyin Lu, Quan Cheng, Wei-wei Tu, Lijun Zhang
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results on optimizing strongly convex functions and training deep networks demonstrate the effectiveness of our method. |
| Researcher Affiliation | Collaboration | Guanghui Wang1, Shiyin Lu1, Quan Cheng1, Wei-Wei Tu2 and Lijun Zhang1, 1National Key Laboratory for Novel Software Technology, Nanjing University, China 24Paradigm Inc., Beijing, China |
| Pseudocode | Yes | Algorithm 1 SAdam" and "Algorithm 2 SAdam with time-variant δt (SAdam D) |
| Open Source Code | No | Not found. The paper does not provide concrete access to its source code or state that it is open-source. |
| Open Datasets | Yes | In both experiments, we examine the performances of the aforementioned algorithms on three widely used datasets: MNIST (60000 training samples, 10000 test samples), CIFAR10 (50000 training samples, 10000 test samples), and CIFAR100 (50000 training samples, 10000 test samples). We refer to Le Cun (1998) and Krizhevsky (2009) for more details of the three datasets. |
| Dataset Splits | No | The paper only provides training and test sample counts (e.g., 'MNIST (60000 training samples, 10000 test samples)') but does not specify a validation dataset split or its details. |
| Hardware Specification | No | Not found. The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | Not found. The paper does not provide specific software dependencies with version numbers (e.g., programming language, libraries, frameworks). |
| Experiment Setup | Yes | For Adam, Adam NC and AMSgrad, we choose δ = 10 8 according to the recommendations in their papers. For our SAdam... we use a rather large δ = 10 2... For each algorithm, we choose α from the set {0.1, 0.01, 0.001, 0.0001}... Our proposed SAdam, with β1 = 0.9, β2t = 1 0.9/t..." and "4-layer CNN, which consists of two convolutional layers (each with 32 filters of size 3 3), one max-pooling layer (with a 2 2 window and 0.25 dropout), and one fully connected layer (with 128 hidden units and 0.5 dropout). We employ Re LU function as the activation function for convolutional layers and softmax function as the activation function for the fully connected layer. The loss function is the cross-entropy. |