reproducibilityindex.ai

Information Maximization for Few-Shot Learning

Authors: Malik Boudiaf, Imtiaz Ziko, Jérôme Rony, Jose Dolz, Pablo Piantanida, Ismail Ben Ayed

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Following standard transductive few-shot settings, our comprehensive experiments2 demonstrate that TIM outperforms state-of-the-art methods signiﬁcantly across various datasets and networks
Researcher Affiliation	Academia	Malik Boudiaf ÉTS Montreal Ziko Imtiaz Masud ÉTS Montreal Jérôme Rony ÉTS Montreal Jose Dolz ÉTS Montreal Pablo Piantanida Centrale Supélec-CNRS Université Paris-Saclay Ismail Ben Ayed ÉTS Montreal
Pseudocode	No	The paper describes the mathematical formulations and propositions for optimization but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	2Code publicly available at https://github.com/mboudiaf/TIM
Open Datasets	Yes	Datasets: We resort to 3 few-shot learning datasets to benchmark the proposed models. As standard few-shot benchmarks, we use the mini-Imagenet [45] dataset, with 100 classes split as in [35], the Caltech-UCSD Birds 200 [47] (CUB) dataset, with 200 classes, split following [5], and ﬁnally the larger tiered-Imagenet dataset, with 608 classes split as in [36].
Dataset Splits	Yes	The few-shot scenario assumes that we are given a test dataset: Xtest := {xi, yi}Ntest i=1, with a completely new set of classes Ytest such that Ybase Ytest = , from which we create randomly sampled few-shot tasks, each with a few labeled examples. Speciﬁcally, each K-way NS-shot task involves sampling NS labeled examples from each of K different classes, also chosen at random. Let S denote the set of these labeled examples, referred to as the support set with size \|S\| = NS K. Furthermore, each task has a query set denoted by Q composed of \|Q\| = NQ K unlabeled (unseen) examples from each of the K classes.
Hardware Specification	Yes	Our methods were run on the same GTX 1080 Ti GPU, while the run-time of [7] is directly reported from the paper.
Software Dependencies	No	The paper mentions using the ADAM optimizer and standard networks like Res Net-18 and WRN28-10, but it does not provide specific version numbers for software dependencies (e.g., programming languages, libraries, or frameworks).
Experiment Setup	Yes	Hyperparameters: To keep our experiments as simple as possible, our hyperparameters are kept ﬁxed across all the experiments and methods (TIM-GD and TIM-ADM). The conditional entropy weight α and the cross-entropy weights λ in Objective (3) are both set to 0.1. The temperature parameter τ in the classiﬁer is set to 15. In our TIM-GD method, we use the ADAM optimizer with the recommended parameters [20], and run 1000 iterations for each task. For TIM-ADM, we run 150 iterations. Base-training procedure: The feature extractors are trained following the same simple base-training procedure as in [51] and using standard networks (Res Net-18 and WRN28-10), for all the experiments. Speciﬁcally, they are trained using the standard cross-entropy loss on the base classes, with label smoothing. The label-smoothing parameter is set to 0.1. We emphasize that base training does not involve any meta-learning or episodic training strategy. The models are trained for 90 epochs, with the learning rate initialized to 0.1, and divided by 10 at epochs 45 and 66. Batch size is set to 256 for Res Net-18, and to 128 for WRN28-10.