reproducibilityindex.ai

SAdam: A Variant of Adam for Strongly Convex Functions

Authors: Guanghui Wang, Shiyin Lu, Quan Cheng, Wei-wei Tu, Lijun Zhang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results on optimizing strongly convex functions and training deep networks demonstrate the effectiveness of our method.
Researcher Affiliation	Collaboration	Guanghui Wang1, Shiyin Lu1, Quan Cheng1, Wei-Wei Tu2 and Lijun Zhang1, 1National Key Laboratory for Novel Software Technology, Nanjing University, China 24Paradigm Inc., Beijing, China
Pseudocode	Yes	Algorithm 1 SAdam" and "Algorithm 2 SAdam with time-variant δt (SAdam D)
Open Source Code	No	Not found. The paper does not provide concrete access to its source code or state that it is open-source.
Open Datasets	Yes	In both experiments, we examine the performances of the aforementioned algorithms on three widely used datasets: MNIST (60000 training samples, 10000 test samples), CIFAR10 (50000 training samples, 10000 test samples), and CIFAR100 (50000 training samples, 10000 test samples). We refer to Le Cun (1998) and Krizhevsky (2009) for more details of the three datasets.
Dataset Splits	No	The paper only provides training and test sample counts (e.g., 'MNIST (60000 training samples, 10000 test samples)') but does not specify a validation dataset split or its details.
Hardware Specification	No	Not found. The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	Not found. The paper does not provide specific software dependencies with version numbers (e.g., programming language, libraries, frameworks).
Experiment Setup	Yes	For Adam, Adam NC and AMSgrad, we choose δ = 10 8 according to the recommendations in their papers. For our SAdam... we use a rather large δ = 10 2... For each algorithm, we choose α from the set {0.1, 0.01, 0.001, 0.0001}... Our proposed SAdam, with β1 = 0.9, β2t = 1 0.9/t..." and "4-layer CNN, which consists of two convolutional layers (each with 32 ﬁlters of size 3 3), one max-pooling layer (with a 2 2 window and 0.25 dropout), and one fully connected layer (with 128 hidden units and 0.5 dropout). We employ Re LU function as the activation function for convolutional layers and softmax function as the activation function for the fully connected layer. The loss function is the cross-entropy.