Learn More for Food Recognition via Progressive Self-Distillation

Authors: Yaohui Zhu, Linhu Liu, Jiang Tian

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on three datasets demonstrate the effectiveness of our proposed method and state-of-the-art performance.
Researcher Affiliation Collaboration 1 School of Artifical Intelligance, Beijing Normal University, Beijing 10875, China 2 AI Lab, Lenovo Research, Beijing, China yaohui.zhu@bnu.edu.cn, {liulh7, tianjiang1}@lenovo.com
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code (no specific repository link, explicit code release statement, or mention of code in supplementary materials).
Open Datasets Yes We validate our method on three commonly used food datasets. ETHZ Food-101 (Bossard, Guillaumin, and Van Gool 2014) contains 101,000 images with 101 food categories. Vireo Food-172 (Chen and Ngo 2016) contains 110,241 food images from 172 categories. ISIA Food-500 (Min et al. 2020) consists of 399,726 images with 500 categories.
Dataset Splits Yes On Vireo Food-172 and ISIA Food-500, the model of the highest performance on validation set is used for test. Following commonly used splits, 60%, 10%, 30% images of each food category are randomly selected for training, validation and testing, respectively.
Hardware Specification Yes All experiments are implemented on the Pytorch platform with one Nvidia A100 GPU.
Software Dependencies No The paper mentions "Pytorch platform" but does not provide specific version numbers for Pytorch or any other software dependencies.
Experiment Setup Yes The input image size is set to 224 224 in all experiments. We set a percentile η = 5% in Mc as a threshold, ωl = 1, the ramp-up epochs β = 5 in Eq. 7, and the number of self-distillation m = 2 in all experiments. When employing Swin-B (Liu et al. 2021) as an embedding network, the model is optimized by adamw (Kingma and Ba 2014) algorithm with an initial learning rate of 5 10 5 and a weight decay of 10 8. The total number of training epochs is 50, and a batch size of 42 and gradient clipping with a max norm of 5 are used. In Eq. 8, α = 2.0. When employing Dense Net161 (Huang et al. 2017) as an embedding network, the model is optimized using stochastic gradient descent with a momentum of 0.9 and a weight decay of 10 4. The learning rate is initially set to 10 3 and divided by 10 after 10 epochs. The total number of training epochs is 30, and the batch size is 42. In Eq. 8, α = 1.0.