Mixed Link Networks
Authors: Wenhai Wang, Xiang Li, Tong Lu, Jian Yang
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To show the efficiency and effectiveness of the proposed Mix Nets, we conduct extensive experiments on four competitive benchmark datasets, namely, CIFAR-10, CIFAR-100, SVHN and Image Net. |
| Researcher Affiliation | Academia | Wenhai Wang 1, Xiang Li 2, Tong Lu 1, Jian Yang 2 1 National Key Lab for Novel Software Technology, Nanjing University 2 Deep Insight@PCALab, Nanjing University of Science and Technology |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any specific statement or link for the open-source code of the described methodology. |
| Open Datasets | Yes | CIFAR. The two CIFAR datasets [Krizhevsky and Hinton, 2009] consist of colored natural images with 32 32 pixels. SVHN. The Street View House Numbers (SVHN) dataset [Netzer et al., 2011] contains 32 32 colored digit images. Image Net. The ILSVRC 2012 classification dataset [Deng et al., 2009] contains 1.2 million images for training, and 50K for validation, from 1K classes. |
| Dataset Splits | Yes | The training and test sets contain 50K and 10K images, respectively. (CIFAR) There are 73,257 images in the training set, 26,032 images in the test set, and 531,131 images for extra training data. [...] and a validation set with 6,000 images is split from the training set. (SVHN) contains 1.2 million images for training, and 50K for validation, from 1K classes. (Image Net) |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions general software components like stochastic gradient descent (SGD), batch normalization, ReLU, and convolution, and refers to dropout layers and weight initialization methods, but does not provide specific version numbers for any libraries or frameworks (e.g., PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | All the networks are trained by using stochastic gradient descent (SGD). On CIFAR and SVHN we train using batch size 64 for 300 epochs. The initial learning rate is set to 0.1, and is divided by 10 at 50% and 75% of the total number of training epochs. On Image Net, we train models with a mini-batch size 150 (Mix Net-121) and 100 (Mix Net-141) due to GPU memory constraints. To compensate for the smaller batch size, the models are trained for 100 epochs, and the learning rate is lowered by 10 times at epoch 30, 60 and 90. Following [He et al., 2016a], we use a weight decay of 10 4 and a Nesterov momentum [Sutskever et al., 2013] of 0.9 without dampening. We adopt the weight initialization introduced by [He et al., 2015]. For the the dataset without data augmentation (i.e., SVHN), we follow the Dense Net setting [Huang et al., 2017] and add a dropout layer [Srivastava et al., 2014] after each convolutional layer (except the first one) by setting the dropout rate as 0.2. |