Differentiable Top-k Classification Learning

Authors: Felix Petersen, Hilde Kuehne, Christian Borgelt, Oliver Deussen

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed losses for fine-tuning on state-ofthe-art architectures, as well as for training from scratch. We find that relaxing k not only produces better top-5 accuracies, but also leads to top-1 accuracy improvements. When fine-tuning publicly available Image Net models, we achieve a new state-of-the-art for these models.
Researcher Affiliation Collaboration 1University of Konstanz 2University of Frankfurt 3MIT-IBM Watson AI Lab 4University of Salzburg.
Pseudocode No The paper describes algorithms and network constructions, but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Code will be available at github.com/Felix-Petersen/difftopk
Open Datasets Yes We empirically evaluate our method using four differentiable sorting and ranking methods on the CIFAR-100 (Krizhevsky et al., 2009), the Image Net-1K (Deng et al., 2009), and the Image Net-21K-P (Ridnik et al., 2021) data sets.
Dataset Splits No The paper describes the datasets used and general training settings (epochs, batch size, learning rate) but does not provide specific details on how the data was split into training, validation, and test sets, nor does it cite standard splits.
Hardware Specification No The paper discusses the computational effort required for large numbers of classes and general training strategies but does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using the "Adam optimizer (Kingma & Ba, 2015)" but does not specify any software dependencies or libraries with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes For training, we use the Adam optimizer (Kingma & Ba, 2015). For training on CIFAR-100 from scratch, we train for up to 200 epochs with a batch size of 100 at a learning rate of 10 3. For Image Net-1K, we train for up to 100 epochs at a batch size of 500 and a learning rate of 10 4.5. For Image Net-21K-P, we train for up to 40 epochs at a batch size of 500 and a learning rate of 10 4. We use early stopping and found that these settings lead to convergence in all settings.