SuperLoss: A Generic Loss for Robust Curriculum Learning

Authors: Thibault Castells, Philippe Weinzaepfel, Jerome Revaud

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 3 Experimental results
Researcher Affiliation Industry Thibault Castells Naver Labs Europe thibault.castells@wanadoo.fr Philippe Weinzaepfel Naver Labs Europe philippe.weinzaepfel@naverlabs.com Jerome Revaud Naver Labs Europe jerome.revaud@naverlabs.com
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper refers to third-party open-source projects (detectron2, cnnimageretrieval-pytorch) used in their experiments, but does not provide specific access to their own implementation code for the Super Loss method.
Open Datasets Yes We perform a toy regression experiment on MNIST [26]... We experiment on the larger UTKFace dataset [70]... CIFAR-10 and CIFAR-100 [24]... Web Vision [31] is a large-scale dataset... We perform experiments for the object detection task on Pascal VOC [7]... We evaluate the Super Loss on the image retrieval task using the Revisited Oxford and Paris benchmark [42]. To train our method, we use the large-scale Landmarks dataset [2]... We also experiment with the cleaned dataset [12]...
Dataset Splits Yes Landmarks dataset [2] that is composed of about 200K images (divided into 160K/40K for training/validation)
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running experiments.
Software Dependencies No The paper mentions software frameworks like detectron2 and cnnimageretrieval-pytorch, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Our protocol is to first train the baseline and tune its hyper-parameters (e.g., learning rate, weight decay, etc.)... We set the regularization parameter to λ = 1 for CIFAR-10 and to λ = 0.25 for CIFAR-100... train a Res Net-18 model using SGD for 120 epochs with a weight decay of 10-4, an initial learning rate of 0.1, divided by 10 at 30, 60 and 90 epochs.