Gradually Updated Neural Networks for Large-Scale Image Recognition
Authors: Siyuan Qiao, Zhishuai Zhang, Wei Shen, Bo Wang, Alan Yuille
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that the networks based on our method achieve the state-of-the-art performances on CIFAR and Image Net datasets. |
| Researcher Affiliation | Collaboration | 1Johns Hopkins University 2Shanghai University 3Hikvision Research. |
| Pseudocode | Yes | Algorithm 1 Back-propagation for GUNN Input :U( ) = (Ucl Uc(l 1) ... Uc1)( ), input x, output y = U(x), gradients L/ y, and parameters Θ for U. Output: L/ Θ, L/ x L/ x L/ y for i l to 1 do yc xc, c ci L/ y, L/ Θci BP(y, L/ x, Uci, Θci) ( L/ x)c ( L/ y)c, c ci ( L/ x)c ( L/ x)c + ( L/ y)c, c ci end |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code for the methodology, nor does it include any links to a code repository. |
| Open Datasets | Yes | We test our proposed GUNN on highly competitive benchmark datasets, i.e. CIFAR (Krizhevsky & Hinton, 2009) and Image Net (Russakovsky et al., 2015). |
| Dataset Splits | Yes | For both of the datasets, the training and test set contain 50,000 and 10,000 images, respectively. ... The Image Net dataset (Russakovsky et al., 2015) contains about 1.28 million color images for training and 50,000 for validation. |
| Hardware Specification | Yes | All the results reported for CIFAR, regardless of the detailed configurations, were trained using 4 NVIDIA Titan X GPUs with the data parallelism. ... We use 8 Tesla V100 GPUs with the data parallelism to get the reported results. |
| Software Dependencies | No | The paper mentions using 'stochastic gradient descents' and 'data parallelism' but does not specify any software names with version numbers, such as a deep learning framework or specific libraries. |
| Experiment Setup | Yes | the initial learning rate is set to 0.1, the weight decay is set to 1e-4, and the momentum is set to 0.9 without dampening. We train the models for 300 epochs. The learning rate is divided by 10 at 150th epoch and 225th epoch. We set the batch size to 64, following (Huang et al., 2017b). |