Deep Hyperspherical Learning

Authors: Weiyang Liu, Yan-Ming Zhang, Xingguo Li, Zhiding Yu, Bo Dai, Tuo Zhao, Le Song

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments have verified our conclusions. We perform experiments on CIFAR10 (only with random left-right flipping), CIFAR10+ (with full data augmentation), CIFAR100 and large-scale Imagenet 2012 datasets [17].
Researcher Affiliation Academia 1Georgia Institute of Technology 2Institute of Automation, Chinese Academy of Sciences 3University of Minnesota 4Carnegie Mellon University
Pseudocode No The paper describes mathematical formulations and derivations of its operators and optimization, such as Fs(w, x) = g(θ(w,x)) + b Fs and optimization steps involving gradients like ∂g(θ(w,x))/∂w, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about the release of source code or a link to a code repository for the described methodology.
Open Datasets Yes We perform experiments on CIFAR10 (only with random left-right flipping), CIFAR10+ (with full data augmentation), CIFAR100 and large-scale Imagenet 2012 datasets [17].
Dataset Splits Yes For CIFAR10, CIFAR10+ and CIFAR100, we follow the same settings from [7, 12]. For Imagenet 2012 dataset, we mostly follow the settings in [9].
Hardware Specification No The paper mentions running experiments but does not provide specific details about the hardware used, such as GPU models, CPU types, or memory specifications.
Software Dependencies No For CIFAR-10 and CIFAR-100, we use the ADAM, starting with the learning rate 0.001. For Imagenet-2012, we use the SGD with momentum 0.9. The paper mentions optimizers but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes For CIFAR-10 and CIFAR-100, we use the ADAM, starting with the learning rate 0.001. The batch size is 128 if not specified. The learning rate is divided by 10 at 34K, 54K iterations and the training stops at 64K. For both A-Softmax and GA-Softmax loss, we use m=4. For Imagenet-2012, we use the SGD with momentum 0.9. The learning rate starts with 0.1, and is divided by 10 at 200K and 375K iterations. The training stops at 550K iteration.