reproducibilityindex.ai

Delta-encoder: an effective sample synthesis method for few-shot object recognition

Authors: Eli Schwartz, Leonid Karlinsky, Joseph Shtok, Sivan Harary, Mattias Marder, Abhishek Kumar, Rogerio Feris, Raja Giryes, Alex Bronstein

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our proposed solution is a simple, yet effective method (in the light of the obtained empirical results) for learning to sample from the class distribution after being provided with one or a few examples of that class. It exhibits improved performance compared to the state-of-the-art methods for few-shot classiﬁcation on a variety of standard few-shot classiﬁcation benchmarks.
Researcher Affiliation	Collaboration	1IBM Research AI 2School of Electrical Engineering, Tel-Aviv University, Tel-Aviv, Israel 3Department of Computer Science, Technion, Haifa, Israel
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available here.
Open Datasets	Yes	We have evaluated the few-shot classiﬁcation performance of the proposed method on multiple datasets, which are the benchmarks of choice for the majority of few-shot learning literature, namely: mini Image Net, CIFAR-100, Caltech-256, CUB, APY, SUN and AWA2. mini Image Net [43], CIFAR-100 [22], Caltech-256 Object Category [15], Caltech-UCSD Birds 200 (CUB) [48], Attribute Pascal & Yahoo (a PY) [9], Scene UNderstanding (SUN) [50], Animals with Attributes 2 (AWA2) [49]
Dataset Splits	Yes	We followed the standard splits used for few-shot learning for the ﬁrst four datasets; for the other datasets that are not commonly used for few-shot, we used the split suggested in [49] for zero-shot learning. The experimental protocol in terms of splitting of the dataset into disjoint sets of training and testing classes is the same as in all the other works evaluated on the same datasets.
Hardware Specification	Yes	each epoch takes about 20 seconds running on an Nvidia Tesla K40m GPU (48K training samples, batch size 128).
Software Dependencies	No	The paper mentions models like VGG16 and ResNet18 and the Adam optimizer but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	In all the experiments, images are represented by pre-computed feature vectors. In all our experiments we are using the VGG16 [38] or Res Net18 [18] models for feature extraction. For both models the head, i.e., the layers after the last convolution, is replaced by two fully-connected layers with 2048 units with Re LU activations. The features used are the 2048-dimensional outputs of the last fully-connected layer. Following the ideas of [28], we augment the L1 reconstruction loss ( X ˆX 1) to include adaptive weights: P i wi\|Xi ˆXi\|, where wi = \|Xi ˆXi\|2/ X ˆX 2, encouraging larger gradients for feature dimensions with higher residual error. The encoder and decoder sub-networks are implemented as multi-layer perceptrons with a single hidden layer of 8192 units, where each layer is followed by a leaky Re LU activation (max(x, 0.2 x)). The encoder output Z is 16-dimensional. All models are trained with Adam optimizer with the learning rate set to 10 5. Dropout with 50% rate is applied to all layers. In all experiments 1024 samples are synthesized for each unseen class. The -encoder training takes about 10 epochs to reach convergence; each epoch takes about 20 seconds running on an Nvidia Tesla K40m GPU (48K training samples, batch size 128).