Follow the Moving Leader in Deep Learning
Authors: Shuai Zheng, James T. Kwok
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, experiments are performed on a number of deep learning models, including convolutional neural networks (Section 4.1), deep residual networks (Section 4.2), memory networks (Section 4.3), neural conversational model (Section 4.4), deep Q-network (Section 4.5), and long short-term memory (LSTM) (Section 4.6). A summary of the empirical performance of the various deep learning optimizers is presented in Section 4.7. |
| Researcher Affiliation | Academia | Shuai Zheng 1 James T. Kwok 1 1Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong. |
| Pseudocode | Yes | Algorithm 1 Follow the Moving Leader (FTML). |
| Open Source Code | No | The paper references third-party implementations and libraries (Keras, Torch, specific GitHub repositories for ResNet, Neural Conversational Model, and Atari DQN) that were used, but does not provide specific access to the source code for their proposed FTML method. |
| Open Datasets | Yes | We use the example models on the MNIST and CIFAR-10 data sets from the Keras library... experiments on the single supporting fact task in the bAbI data set (Sukhbaatar et al., 2015; Weston et al., 2016)... its default data set Cornell Movie-Dialogs Corpus (with 50, 000 samples) (Danescu-Niculescu-Mizil & Lee, 2011)... Experiments are performed on two computer games on the Atari 2600 platform: Breakout and Asterix. |
| Dataset Splits | No | The paper mentions using various datasets and reports training loss, but it does not provide specific details on how these datasets were split into training, validation, or test sets (e.g., percentages, sample counts, or references to standard splits). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions using the 'Keras library' and 'Torch implementation' but does not specify version numbers for these or any other software dependencies, making it difficult to reproduce the exact software environment. |
| Experiment Setup | Yes | For FTML, we set β1 = 0.6, β2 = 0.999, and a constant ϵt = ϵ = 10−8 for all t. For FTML, Adam, RMSprop, and NAG, η is selected by monitoring performance on the training set... The learning rate is chosen from {0.5, 0.25, 0.1, . . . , 0.00005, 0.000025, 0.00001}. Minibatches of sizes 128 and 32 are used for MNIST and CIFAR-10, respectively. A minibatch size of 32 is used (ResNet). We truncate backpropagation through time (BPTT) to 5 timesteps, and input 5 samples to the LSTM in each iteration. |