Tunable Sensitivity to Large Errors in Neural Network Training
Authors: Gil Keren, Sivan Sabato, Bjrn Schuller
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We tested our method on several benchmark datasets. We propose, and corroborate in our experiments, that the optimal level of sensitivity to hard example is positively correlated with the depth of the network. Moreover, the test prediction error obtained by our method is generally lower than that of the vanilla cross-entropy gradient learner. |
| Researcher Affiliation | Academia | Gil Keren Chair of Complex and Intelligent Systems, University of Passau Passau, Germany; Sivan Sabato Department of Computer Science Ben-Gurion University of the Negev Beer Sheva, Israel; Bj orn Schuller Chair of Complex and Intelligent Systems, University of Passau Passau, Germany, Machine Learning Group Imperial College London, U.K. |
| Pseudocode | No | The paper describes its methods mathematically and textually but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about open-source code release or links to code repositories for the described methodology. |
| Open Datasets | Yes | For our experiments, we used four classification benchmark datasets from the field of computer vision: The MNIST dataset (Le Cun et al. 1998), the Street View House Numbers dataset (SVHN) (Netzer et al. 2011) and the CIFAR-10 and CIFAR-100 datasets (Krizhevsky and Hinton 2009). |
| Dataset Splits | Yes | We generated 30,000 examples for each of the training, validation and test datasets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions general algorithms like 'stochastic gradient descent with momentum' but does not specify any software names with version numbers (e.g., libraries, frameworks, or programming languages with versions) used for implementation. |
| Experiment Setup | Yes | We used batch gradient descent with a learning rate of 0.01 for optimization of the four parameters, where the gradient is replaced with the pseudo-gradient |