Differentiable Top-k Classification Learning
Authors: Felix Petersen, Hilde Kuehne, Christian Borgelt, Oliver Deussen
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed losses for fine-tuning on state-ofthe-art architectures, as well as for training from scratch. We find that relaxing k not only produces better top-5 accuracies, but also leads to top-1 accuracy improvements. When fine-tuning publicly available Image Net models, we achieve a new state-of-the-art for these models. |
| Researcher Affiliation | Collaboration | 1University of Konstanz 2University of Frankfurt 3MIT-IBM Watson AI Lab 4University of Salzburg. |
| Pseudocode | No | The paper describes algorithms and network constructions, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code will be available at github.com/Felix-Petersen/difftopk |
| Open Datasets | Yes | We empirically evaluate our method using four differentiable sorting and ranking methods on the CIFAR-100 (Krizhevsky et al., 2009), the Image Net-1K (Deng et al., 2009), and the Image Net-21K-P (Ridnik et al., 2021) data sets. |
| Dataset Splits | No | The paper describes the datasets used and general training settings (epochs, batch size, learning rate) but does not provide specific details on how the data was split into training, validation, and test sets, nor does it cite standard splits. |
| Hardware Specification | No | The paper discusses the computational effort required for large numbers of classes and general training strategies but does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using the "Adam optimizer (Kingma & Ba, 2015)" but does not specify any software dependencies or libraries with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | For training, we use the Adam optimizer (Kingma & Ba, 2015). For training on CIFAR-100 from scratch, we train for up to 200 epochs with a batch size of 100 at a learning rate of 10 3. For Image Net-1K, we train for up to 100 epochs at a batch size of 500 and a learning rate of 10 4.5. For Image Net-21K-P, we train for up to 40 epochs at a batch size of 500 and a learning rate of 10 4. We use early stopping and found that these settings lead to convergence in all settings. |