Boundary Matters: A Bi-Level Active Finetuning Method

Authors: Han Lu, Yichen Xie, Xiaokang Yang, Junchi Yan

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments provide qualitative and quantitative evidence of our method s superior efficacy, consistently outperforming the existing baselines.
Researcher Affiliation Academia Han Lu1, Yichen Xie2, Xiaokang Yang1, Junchi Yan1, 1 Dept. of CSE & School of AI & Moe Key Lab of AI, Shanghai Jiao Tong University 2University of California, Berkeley
Pseudocode Yes Algorithm 1 Pseudo-code for Bi LAF
Open Source Code Yes https://github.com/Thinklab-SJTU/Bi LAF
Open Datasets Yes Firstly, we evaluate our method using three widely recognized classification datasets: CIFAR10, CIFAR100 [22], and Image Net-1k [32].
Dataset Splits Yes Both CIFAR10 and CIFAR100 contain 60,000 images with resolutions of 32x32... Each comprises 50,000 images for training and 10,000 for testing. The large-scale dataset Image Net-1k includes 1,000 categories and a total of 1,281,167 training images along with 50,000 validation images.
Hardware Specification Yes All experiments were conducted using Ge Force RTX 3090(24G) GPUs and Intel(R) Core(TM) i9-10920X CPUs.
Software Dependencies No In the core samples selection stage, we utilize Active FT and optimize the parameters θS using the Adam [21] optimizer (learning rate 1e-3) until convergence. Our experiments are implemented using the mmclassification framework, mmdetection framework, and mmsegmentation framework. However, specific version numbers for these software dependencies are not provided.
Experiment Setup Yes In the core samples selection stage, we utilize Active FT and optimize the parameters θS using the Adam [21] optimizer (learning rate 1e-3) until convergence. We set the core number K as 50(0.1%), 250(0.5%), 6405(0.5%) for CIFAR10, CIFAR100 and Image Net separately. In the boundary samples selection stage, we consistently set nearest neighbors number k as 10, both removal ratio Prm and clustering fraction Pin as 10%, opponent penalty coefficient δ as 1.1. In the supervised finetuning phase, we finetune the models using the SGD optimizer with learning rate as 3e-3, weight decay as 1e-4 and momentum as 0.9. We employ cosine learning rate decay with a batch size of 256 distributed across two GPUs. The models are finetuned for 1000 epochs on all datasets with different sampling ratios, except for Image Net with sampling ratio 5%, where we finetune for 300 epochs.