Delta-encoder: an effective sample synthesis method for few-shot object recognition
Authors: Eli Schwartz, Leonid Karlinsky, Joseph Shtok, Sivan Harary, Mattias Marder, Abhishek Kumar, Rogerio Feris, Raja Giryes, Alex Bronstein
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our proposed solution is a simple, yet effective method (in the light of the obtained empirical results) for learning to sample from the class distribution after being provided with one or a few examples of that class. It exhibits improved performance compared to the state-of-the-art methods for few-shot classification on a variety of standard few-shot classification benchmarks. |
| Researcher Affiliation | Collaboration | 1IBM Research AI 2School of Electrical Engineering, Tel-Aviv University, Tel-Aviv, Israel 3Department of Computer Science, Technion, Haifa, Israel |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available here. |
| Open Datasets | Yes | We have evaluated the few-shot classification performance of the proposed method on multiple datasets, which are the benchmarks of choice for the majority of few-shot learning literature, namely: mini Image Net, CIFAR-100, Caltech-256, CUB, APY, SUN and AWA2. mini Image Net [43], CIFAR-100 [22], Caltech-256 Object Category [15], Caltech-UCSD Birds 200 (CUB) [48], Attribute Pascal & Yahoo (a PY) [9], Scene UNderstanding (SUN) [50], Animals with Attributes 2 (AWA2) [49] |
| Dataset Splits | Yes | We followed the standard splits used for few-shot learning for the first four datasets; for the other datasets that are not commonly used for few-shot, we used the split suggested in [49] for zero-shot learning. The experimental protocol in terms of splitting of the dataset into disjoint sets of training and testing classes is the same as in all the other works evaluated on the same datasets. |
| Hardware Specification | Yes | each epoch takes about 20 seconds running on an Nvidia Tesla K40m GPU (48K training samples, batch size 128). |
| Software Dependencies | No | The paper mentions models like VGG16 and ResNet18 and the Adam optimizer but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | In all the experiments, images are represented by pre-computed feature vectors. In all our experiments we are using the VGG16 [38] or Res Net18 [18] models for feature extraction. For both models the head, i.e., the layers after the last convolution, is replaced by two fully-connected layers with 2048 units with Re LU activations. The features used are the 2048-dimensional outputs of the last fully-connected layer. Following the ideas of [28], we augment the L1 reconstruction loss ( X ˆX 1) to include adaptive weights: P i wi|Xi ˆXi|, where wi = |Xi ˆXi|2/ X ˆX 2, encouraging larger gradients for feature dimensions with higher residual error. The encoder and decoder sub-networks are implemented as multi-layer perceptrons with a single hidden layer of 8192 units, where each layer is followed by a leaky Re LU activation (max(x, 0.2 x)). The encoder output Z is 16-dimensional. All models are trained with Adam optimizer with the learning rate set to 10 5. Dropout with 50% rate is applied to all layers. In all experiments 1024 samples are synthesized for each unseen class. The -encoder training takes about 10 epochs to reach convergence; each epoch takes about 20 seconds running on an Nvidia Tesla K40m GPU (48K training samples, batch size 128). |