TransBoost: Improving the Best ImageNet Performance using Deep Transduction

Authors: Omer Belhasin, Guy Bar-Shalom, Ran El-Yaniv

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method significantly improves the Image Net classification performance on a wide range of architectures, such as Res Nets, Mobile Net V3-L, Efficient Net B0, Vi T-S, and Conv Next-T, leading to state-of-the-art transductive performance. Additionally we show that Trans Boost is effective on a wide variety of image classification datasets.
Researcher Affiliation Collaboration Omer Belhasin Department of Computer Science Technion Israel Institute of Technology omer.be@cs.technion.ac.il Guy Bar-Shalom Department of Computer Science Technion Israel Institute of Technology guy.b@cs.technion.ac.il Ran El-Yaniv Department of Computer Science Technion Israel Institute of Technology, Deci.AI rani@cs.technion.ac.il
Pseudocode Yes Algorithm 1: Trans Boost Procedure
Open Source Code Yes The implementation of Trans Boost is provided at: https://github.com/omerb01/Trans Boost.
Open Datasets Yes Most of our study of Trans Boost is done using the well-known Image Net-1k ILSVRC-2012 dataset [19], which contains 1,281,167 training instances and 50,000 test instances in 1,000 categories. Additionally, we investigated the effectiveness of Trans Boost on the Food-101 [21], CIFAR-10 and CIFAR-100 [22], SUN-397 [23], Stanford Cars [24], FGVC Aircraft [25], the Describable Textures Dataset (DTD) [26] and Oxford 102 Flowers [27].
Dataset Splits Yes Our hyperparameters were fine-tuned based on a validation set that was sampled from the training set over Image Net.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory, or specific computing infrastructure) used for running experiments are provided in the paper.
Software Dependencies No The paper mentions software like PyTorch and the Timm repository but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We used the SGD optimizer (Nesterov) with a momentum of 0.9, weight decay of 10 4, and a batch size of 1024 (consisting of 512 labeled training instances and 512 unlabeled test instances). Unless otherwise specified, the learning rate was fixed to 10 3 with no warmup for 120 epochs. The regularization hyperparameter of our loss in all our experiments was fixed to λ = 2; see Equation (7).