Feature Fusion from Head to Tail for Long-Tailed Visual Recognition
Authors: Mengke Li, Zhikai HU, Yang Lu, Weichao Lan, Yiu-ming Cheung, Hui Huang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on various long-tailed benchmarks demonstrate the effectiveness of the proposed H2T. The source code is available at https://github.com/Keke921/H2T. |
| Researcher Affiliation | Academia | Mengke Li1,2, Zhikai Hu3, Yang Lu4, Weichao Lan3, Yiu-ming Cheung3, Hui Huang2* 1 Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen, China 2 College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China 3 Department of Computer Science, Hong Kong Baptist University, Hong Kong, China 4 Fujian Key Laboratory of Sensing and Computing for Smart City, School of Informatics, Xiamen University, Xiamen, China |
| Pseudocode | Yes | Algorithm 1: H2T Input: Training set, fusion ration p ; Output: Trained model; 1 Initialize the model ϕ randomly ; 2 for iter = 1 to E0 do 3 Sampling batches of data (x, y) T I from the instance-wise sampling data; 4 Obtain the feature map F = ϕθ(x) ; 5 Calculate logits z = WT f and loss L1(x, y); 6 ϕ = ϕ α ϕL1((x, y); ϕ). 8 for iter = E0 + 1 to E2 do 9 Sample batches of data (x B, y B) T B and (x, y) T I; 10 Obtain feature maps FB = ϕθ(x B) and FI = ϕθ(x) ; 11 Fuse feature maps by Eq. 1 to obtain F ; 12 Input F to pooling layer to obtain f, then calculate the logits by z = WT f and the loss by L2(x B, y B) ; 14 Froze the parameters of representation learning ϕr, and finetune the classifier parameters ϕc: ϕc = ϕc α ϕc L2((x B, y B); ϕc). |
| Open Source Code | Yes | The source code is available at https://github.com/Keke921/H2T. |
| Open Datasets | Yes | We evaluate H2T on four widely-used benchmarks: CIFAR100-LT (Cao et al. 2019), Image Net-LT, i Naturalist 2018 (Van Horn et al. 2018), and Places-LT. CIFAR100-LT is a small-scale dataset that is sub-sampled from the balanced version CIFAR100 (Krizhevsky, Hinton et al. 2009). The original versions of image Net2012 (Russakovsky et al. 2015) and Places 365 (Zhou et al. 2017) are also balanced datasets. We use the same settings as Liu et al. (Liu et al. 2019) to obtain the long-tailed versions. i Naturalist is collected from all over the world and is naturally heavily imbalanced. The 2018 version (Van Horn et al. 2018) is utilized in our experiment. We also report the comparison results on CIFAR10-LT (Cao et al. 2019) in the Appendix (Li et al. 2023b). |
| Dataset Splits | Yes | We evaluate H2T on four widely-used benchmarks: CIFAR100-LT (Cao et al. 2019), Image Net-LT, i Naturalist 2018 (Van Horn et al. 2018), and Places-LT. ... The training set includes N samples. ... As for the metrics, besides top-1 classification accuracy, following Liu et al. (Liu et al. 2019), the accuracy on three partitions: head (ni > 100), medium (20 < ni ≤ 100) and tail (ni ≤ 20) are also compared. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions 'SGD with a momentum of 0.9' as an optimization algorithm, but does not provide specific software dependencies or library versions (e.g., PyTorch, TensorFlow, or specific Python libraries with version numbers) required to reproduce the experiments. |
| Experiment Setup | Yes | Basic Settings SGD with a momentum of 0.9 is adopted for all datasets. Stage II is trained with 10 epochs. For CIFAR100-LT, we refer to the settings in Cao et al. (Cao et al. 2019) and Zhong et al. (Zhong et al. 2021). The backbone network is Res Net32 (He et al. 2016). Stage I trains for 200 epochs. The initial learning rate is 0.1 and is decayed at the 160th and 180th epochs by 0.1. The batch size is 128. For image Net LT and i Naturalist 2018, we use the commonly used Res Net50. For Places-LT, we utilize the Res Net-152 pre-trained on image Net. |