Label-Focused Inductive Bias over Latent Object Features in Visual Classification
Authors: Ilmin Kang, HyounYoung Bae, Kangil Kim
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on an image classification task show that LLB improves performance in both quantitative and qualitative analyses |
| Researcher Affiliation | Academia | AI Graduate School, GIST , Republic of Korea {kangilmin0325, bonheur606060, kangilkim}@gmail.com |
| Pseudocode | No | The paper describes the steps of the LLB method in narrative text and with diagrams (Figure 3), but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | The codes are available at https://github.com/GIST-IRR/LLB |
| Open Datasets | Yes | We use standard Image Net (IN1K)(Deng et al., 2009), which consist of 1.28M training images with 1000 classes. We also use additional benchmarks including IN reassessed labels Image Net Real (IN-Real) (Beyer et al., 2020), scene recognition dataset Places356-Standard (Places) (L opez Cifuentes et al., 2020), fine-grained and long-tailed i Naturalist2018 (i Nat18) (Van Horn et al., 2018) dataset. |
| Dataset Splits | Yes | We use standard Image Net (IN1K)(Deng et al., 2009), which consist of 1.28M training images with 1000 classes. We also use additional benchmarks including IN reassessed labels Image Net Real (IN-Real) (Beyer et al., 2020), scene recognition dataset Places356-Standard (Places) (L opez Cifuentes et al., 2020), fine-grained and long-tailed i Naturalist2018 (i Nat18) (Van Horn et al., 2018) dataset. For baselines, we first followed (Dosovitskiy et al., 2020; Steiner et al., 2021) to get vanilla Vi T pre-trained using Image Net21K (IN21K) (Ridnik et al., 2021). |
| Hardware Specification | Yes | Our experiments are on 8 A100 with additional 4 A6000 GPUs for both reproduce baselines and training LLB. |
| Software Dependencies | No | The paper mentions 'Optimizer Adam (Kingma & Ba, 2014)' and 'Rand Augment (Cubuk et al., 2020)' but does not provide specific version numbers for these or any other software libraries or dependencies. |
| Experiment Setup | Yes | Our LLB is built upon pre-trained classical visual feature backbones. We extract hidden vectors from backbone while keeping the backbone parameter freezed. For backbone, we use Vi T (Dosovitskiy et al., 2020) networks. We use l V -th layer and consider them as visual features Vl V = [v0 l V ; v1 l V ; ; vi l V ]. We found that, extraction from l V = LV 1 layer showed best performance (Figure 7a in Appendix). LLB takes Vl V and cluster them into O latent objects. Based on our experiments (Figure 7b in Appendix), we use O=2048. We report the results of different α in Figure 7d in Appendix, and selected best one among them. Additional model settings are summarized in Table 3. We train our model with Cross Entropy loss with additional object diversity regularization term in Equation ( 2). Look at Table 4 in Appendix for detailed hyper-parameters we used. |