Tunable Sensitivity to Large Errors in Neural Network Training

Authors: Gil Keren, Sivan Sabato, Bjšrn Schuller

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We tested our method on several benchmark datasets. We propose, and corroborate in our experiments, that the optimal level of sensitivity to hard example is positively correlated with the depth of the network. Moreover, the test prediction error obtained by our method is generally lower than that of the vanilla cross-entropy gradient learner.
Researcher Affiliation Academia Gil Keren Chair of Complex and Intelligent Systems, University of Passau Passau, Germany; Sivan Sabato Department of Computer Science Ben-Gurion University of the Negev Beer Sheva, Israel; Bj orn Schuller Chair of Complex and Intelligent Systems, University of Passau Passau, Germany, Machine Learning Group Imperial College London, U.K.
Pseudocode No The paper describes its methods mathematically and textually but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about open-source code release or links to code repositories for the described methodology.
Open Datasets Yes For our experiments, we used four classification benchmark datasets from the field of computer vision: The MNIST dataset (Le Cun et al. 1998), the Street View House Numbers dataset (SVHN) (Netzer et al. 2011) and the CIFAR-10 and CIFAR-100 datasets (Krizhevsky and Hinton 2009).
Dataset Splits Yes We generated 30,000 examples for each of the training, validation and test datasets.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions general algorithms like 'stochastic gradient descent with momentum' but does not specify any software names with version numbers (e.g., libraries, frameworks, or programming languages with versions) used for implementation.
Experiment Setup Yes We used batch gradient descent with a learning rate of 0.01 for optimization of the four parameters, where the gradient is replaced with the pseudo-gradient