reproducibilityindex.ai

$f$-Divergence Based Classification: Beyond the Use of Cross-Entropy

Authors: Nicola Novello, Andrea M Tonello

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We theoretically analyze the objective functions proposed and numerically test them in three application scenarios: toy examples, image datasets, and signal detection/decoding problems. The analyzed scenarios demonstrate the effectiveness of the proposed approach and that the SL divergence achieves the highest classification accuracy in almost all the considered cases.
Researcher Affiliation	Academia	Nicola Novello 1 Andrea M. Tonello 1 1Department of Networked and Embedded Systems, University of Klagenfurt, Klagenfurt, Austria. Correspondence to: Nicola Novello <nicola.novello@aau.at>.
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Our implementation can be found at https://github.com/tonellolab/ discriminative-classification-f Div
Open Datasets	Yes	The objective functions performance is tested for the MNIST (Le Cun et al., 1998), Fashion MNIST (Xiao et al., 2017), CIFAR10, and CIFAR100 (Krizhevsky et al., 2009) datasets.
Dataset Splits	Yes	MNIST (Le Cun et al., 1998) comprises 10 classes with 60,000 images for training and 10,000 images for testing. Fashion MNIST has 10 classes with 60,000 images for training and 10,000 images for testing. CIFAR10 (Krizhevsky et al., 2009) has 10 classes with 50,000 images for training and 10,000 images for testing. CIFAR100 (Krizhevsky et al., 2009) has 100 classes with 50,000 images for training and 10,000 images for testing.
Hardware Specification	No	The paper mentions types of neural networks used (e.g., convolutional neural networks, VGG, ResNet, MobileNet V2) and general training parameters (e.g., SGD with momentum, Adam optimizer), but it does not specify any particular hardware components such as CPU or GPU models.
Software Dependencies	No	The paper mentions software components like 'Adam optimizer' and 'Leaky ReLU activation function' by name, but it does not specify any version numbers for these or other key software libraries or programming languages.
Experiment Setup	Yes	The learning rate is initially set to 0.1 and then we use a cosine annealing scheduler (Loshchilov & Hutter, 2017) to modify its value during the 200 epochs of training. The network parameters are updated by using SGD with momentum. The architecture used for the decoding scenario comprises two hidden layers with 100 neurons each. The network weights are updated by using the Adam optimizer (Kingma & Ba, 2015). The Leaky Re LU activation function is utilized in all the layers except the last one.