reproducibilityindex.ai

Large-Margin Softmax Loss for Convolutional Neural Networks

Authors: Weiyang Liu, Yandong Wen, Zhiding Yu, Meng Yang

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on four benchmark datasets demonstrate that the deeply-learned features with L-softmax loss become more discriminative, hence signiﬁcantly boosting the performance on a variety of visual classiﬁcation and veriﬁcation tasks.
Researcher Affiliation	Academia	Weiyang Liu1 WYLIU@PKU.EDU.CN Yandong Wen2 WEN.YANDONG@MAIL.SCUT.EDU.CN Zhiding Yu3 YZHIDING@ANDREW.CMU.EDU Meng Yang4 YANG.MENG@SZU.EDU.CN 1School of ECE, Peking University 2School of EIE, South China University of Technology 3Dept. of ECE, Carnegie Mellon University 4College of CS & SE, Shenzhen University
Pseudocode	No	The paper provides mathematical derivations and discusses forward/backward propagation for the L-Softmax loss, but it does not include explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	Yes	We evaluate the generalized softmax loss in two typical vision applications: visual classiﬁcation and face veriﬁcation. In visual classiﬁcation, we use three standard benchmark datasets: MNIST (Le Cun et al., 1998), CIFAR10 (Krizhevsky, 2009), and CIFAR100 (Krizhevsky, 2009). In face veriﬁcation, we evaluate our method on the widely used LFW dataset (Huang et al., 2007). ... we train on the publicly available CASIA-Web Face (Yi et al., 2014) outside dataset
Dataset Splits	Yes	We start with a learning rate of 0.1, divide it by 10 at 12k and 15k iterations, and eventually terminate training at 18k iterations, which is determined on a 45k/5k train/val split.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models.
Software Dependencies	No	We implement the CNNs using the Caffe library (Jia et al., 2014) with our modiﬁcations. (No specific version number for Caffe or other software dependencies is provided).
Experiment Setup	Yes	Our CNN architectures are described in Table 1. In convolution layers, the stride is set to 1 if not speciﬁed. We implement the CNNs using the Caffe library (Jia et al., 2014) with our modiﬁcations. For all experiments, we adopt the PRe LU (He et al., 2015b) as the activation functions, and the batch size is 256. We use a weight decay of 0.0005 and momentum of 0.9. The weight initialization in (He et al., 2015b) and batch normalization (Ioffe & Szegedy, 2015) are used in our networks but without dropout. Note that we only perform the mean substraction preprocessing for training and testing data. For optimization, normally the stochastic gradient descent will work well. ... We start with a learning rate of 0.1, divide it by 10 at 12k and 15k iterations, and eventually terminate training at 18k iterations, which is determined on a 45k/5k train/val split. ... The learning rate is set to 0.1, 0.01, 0.001 and is switched when the training loss plateaus. The total number of epochs is about is about 30 for our models.