Stepping Forward on the Last Mile

Authors: Chen Feng, Jay Zhuo, Parker Zhang, Ramchalam Kinattinkara Ramakrishnan, Zhaocong Yuan, Andrew Zou Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we investigate the feasibility of ondevice training using fixed-point forward gradients, by conducting comprehensive experiments across a variety of deep learning benchmark tasks in both vision and audio domains.
Researcher Affiliation Collaboration Chen Feng Qualcomm AI Research Qualcomm Canada ULC chenf@qti.qualcomm.com Shaojie Zhuo Qualcomm AI Research Qualcomm Canada ULC shaojiez@qti.qualcomm.com Xiaopeng Zhang Qualcomm AI Research Qualcomm Canada ULC xiaopeng@qti.qualcomm.com Ramchalam Kinattinkara Ramakrishnan Qualcomm AI Research Qualcomm Canada ULC rkinatti@qti.qualcomm.com Zhaocong Yuan Qualcomm AI Research Qualcomm Canada ULC zhaocong@qti.qualcomm.com Andrew Zou Li University of Toronto andrewzou.li@mail.utoronto.ca
Pseudocode Yes Algorithm 1 QZO-FF: Quantized Zero-order Forward Gradient Learning(quantized, fp16)
Open Source Code No However, we cannot open source the code.
Open Datasets Yes Vision Benchmark. Image classification models are compared across commonly used 5 few-shot learning benchmark datasets (Table 1). Training methods are evaluated on 3 network backbones (modified Resnet12 Ye et al. [2020], Resnet18 He et al. [2015] and Vi T tiny Dosovitskiy et al. [2020]), with Proto Nets Snell et al. [2017] as few-shot classifier. Table 1: Vision datasets used for few-shot learning Name Setting No. Classes (train/val/test) No. Samples Resolution CUB Bird Species 200 (140/30/30) 11,788 84 84 Omniglot Handwritten characters 1623 (1000/200/423) 32,460 28 28 Cifar100_fs Color 100 (64/16/20) 60,000 32 32 mini Image Net Natural images 100 (64/16/20) 60,000 84 84 tiered Image Net Natural images 608 (351/97/160) 779,165 84 84
Dataset Splits Yes Table 1: Vision datasets used for few-shot learning Name Setting No. Classes (train/val/test) No. Samples Resolution CUB Bird Species 200 (140/30/30) 11,788 84 84 Omniglot Handwritten characters 1623 (1000/200/423) 32,460 28 28 Cifar100_fs Color 100 (64/16/20) 60,000 32 32 mini Image Net Natural images 100 (64/16/20) 60,000 84 84 tiered Image Net Natural images 608 (351/97/160) 779,165 84 84
Hardware Specification Yes All our experiments are running on single Nvidia Tesla V100 GPU.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes Table 6: The hyper-parameters used in our few-shot learning experiments for vision tasks. For fair comparisons, FF and BP are using the same hyper-parameters. Model architectures of Resnet18, modified Resnet12 and Vi T tiny are based on [14], [43], and [39]. Pre-trained models used for zero-shot evaluation can be found at [33], [34] and [38]. Different learning rate grids are explored, and the best accuracy is reported. Experiment Hyper-parameters Values n_way 5 n_shot 5 ϵ 1e-3 Epochs 40 Optimizer SGD Learning rate {1e-3, 1e-4, 1e-5} Val/test tasks 100/ 100