Gradually Updated Neural Networks for Large-Scale Image Recognition

Authors: Siyuan Qiao, Zhishuai Zhang, Wei Shen, Bo Wang, Alan Yuille

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that the networks based on our method achieve the state-of-the-art performances on CIFAR and Image Net datasets.
Researcher Affiliation Collaboration 1Johns Hopkins University 2Shanghai University 3Hikvision Research.
Pseudocode Yes Algorithm 1 Back-propagation for GUNN Input :U( ) = (Ucl Uc(l 1) ... Uc1)( ), input x, output y = U(x), gradients L/ y, and parameters Θ for U. Output: L/ Θ, L/ x L/ x L/ y for i l to 1 do yc xc, c ci L/ y, L/ Θci BP(y, L/ x, Uci, Θci) ( L/ x)c ( L/ y)c, c ci ( L/ x)c ( L/ x)c + ( L/ y)c, c ci end
Open Source Code No The paper does not provide an explicit statement about releasing source code for the methodology, nor does it include any links to a code repository.
Open Datasets Yes We test our proposed GUNN on highly competitive benchmark datasets, i.e. CIFAR (Krizhevsky & Hinton, 2009) and Image Net (Russakovsky et al., 2015).
Dataset Splits Yes For both of the datasets, the training and test set contain 50,000 and 10,000 images, respectively. ... The Image Net dataset (Russakovsky et al., 2015) contains about 1.28 million color images for training and 50,000 for validation.
Hardware Specification Yes All the results reported for CIFAR, regardless of the detailed configurations, were trained using 4 NVIDIA Titan X GPUs with the data parallelism. ... We use 8 Tesla V100 GPUs with the data parallelism to get the reported results.
Software Dependencies No The paper mentions using 'stochastic gradient descents' and 'data parallelism' but does not specify any software names with version numbers, such as a deep learning framework or specific libraries.
Experiment Setup Yes the initial learning rate is set to 0.1, the weight decay is set to 1e-4, and the momentum is set to 0.9 without dampening. We train the models for 300 epochs. The learning rate is divided by 10 at 150th epoch and 225th epoch. We set the batch size to 64, following (Huang et al., 2017b).