$f$-Divergence Based Classification: Beyond the Use of Cross-Entropy
Authors: Nicola Novello, Andrea M Tonello
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We theoretically analyze the objective functions proposed and numerically test them in three application scenarios: toy examples, image datasets, and signal detection/decoding problems. The analyzed scenarios demonstrate the effectiveness of the proposed approach and that the SL divergence achieves the highest classification accuracy in almost all the considered cases. |
| Researcher Affiliation | Academia | Nicola Novello 1 Andrea M. Tonello 1 1Department of Networked and Embedded Systems, University of Klagenfurt, Klagenfurt, Austria. Correspondence to: Nicola Novello <nicola.novello@aau.at>. |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Our implementation can be found at https://github.com/tonellolab/ discriminative-classification-f Div |
| Open Datasets | Yes | The objective functions performance is tested for the MNIST (Le Cun et al., 1998), Fashion MNIST (Xiao et al., 2017), CIFAR10, and CIFAR100 (Krizhevsky et al., 2009) datasets. |
| Dataset Splits | Yes | MNIST (Le Cun et al., 1998) comprises 10 classes with 60,000 images for training and 10,000 images for testing. Fashion MNIST has 10 classes with 60,000 images for training and 10,000 images for testing. CIFAR10 (Krizhevsky et al., 2009) has 10 classes with 50,000 images for training and 10,000 images for testing. CIFAR100 (Krizhevsky et al., 2009) has 100 classes with 50,000 images for training and 10,000 images for testing. |
| Hardware Specification | No | The paper mentions types of neural networks used (e.g., convolutional neural networks, VGG, ResNet, MobileNet V2) and general training parameters (e.g., SGD with momentum, Adam optimizer), but it does not specify any particular hardware components such as CPU or GPU models. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer' and 'Leaky ReLU activation function' by name, but it does not specify any version numbers for these or other key software libraries or programming languages. |
| Experiment Setup | Yes | The learning rate is initially set to 0.1 and then we use a cosine annealing scheduler (Loshchilov & Hutter, 2017) to modify its value during the 200 epochs of training. The network parameters are updated by using SGD with momentum. The architecture used for the decoding scenario comprises two hidden layers with 100 neurons each. The network weights are updated by using the Adam optimizer (Kingma & Ba, 2015). The Leaky Re LU activation function is utilized in all the layers except the last one. |