reproducibilityindex.ai

Label-Focused Inductive Bias over Latent Object Features in Visual Classification

Authors: Ilmin Kang, HyounYoung Bae, Kangil Kim

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on an image classification task show that LLB improves performance in both quantitative and qualitative analyses
Researcher Affiliation	Academia	AI Graduate School, GIST , Republic of Korea {kangilmin0325, bonheur606060, kangilkim}@gmail.com
Pseudocode	No	The paper describes the steps of the LLB method in narrative text and with diagrams (Figure 3), but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	The codes are available at https://github.com/GIST-IRR/LLB
Open Datasets	Yes	We use standard Image Net (IN1K)(Deng et al., 2009), which consist of 1.28M training images with 1000 classes. We also use additional benchmarks including IN reassessed labels Image Net Real (IN-Real) (Beyer et al., 2020), scene recognition dataset Places356-Standard (Places) (L opez Cifuentes et al., 2020), fine-grained and long-tailed i Naturalist2018 (i Nat18) (Van Horn et al., 2018) dataset.
Dataset Splits	Yes	We use standard Image Net (IN1K)(Deng et al., 2009), which consist of 1.28M training images with 1000 classes. We also use additional benchmarks including IN reassessed labels Image Net Real (IN-Real) (Beyer et al., 2020), scene recognition dataset Places356-Standard (Places) (L opez Cifuentes et al., 2020), fine-grained and long-tailed i Naturalist2018 (i Nat18) (Van Horn et al., 2018) dataset. For baselines, we first followed (Dosovitskiy et al., 2020; Steiner et al., 2021) to get vanilla Vi T pre-trained using Image Net21K (IN21K) (Ridnik et al., 2021).
Hardware Specification	Yes	Our experiments are on 8 A100 with additional 4 A6000 GPUs for both reproduce baselines and training LLB.
Software Dependencies	No	The paper mentions 'Optimizer Adam (Kingma & Ba, 2014)' and 'Rand Augment (Cubuk et al., 2020)' but does not provide specific version numbers for these or any other software libraries or dependencies.
Experiment Setup	Yes	Our LLB is built upon pre-trained classical visual feature backbones. We extract hidden vectors from backbone while keeping the backbone parameter freezed. For backbone, we use Vi T (Dosovitskiy et al., 2020) networks. We use l V -th layer and consider them as visual features Vl V = [v0 l V ; v1 l V ; ; vi l V ]. We found that, extraction from l V = LV 1 layer showed best performance (Figure 7a in Appendix). LLB takes Vl V and cluster them into O latent objects. Based on our experiments (Figure 7b in Appendix), we use O=2048. We report the results of different α in Figure 7d in Appendix, and selected best one among them. Additional model settings are summarized in Table 3. We train our model with Cross Entropy loss with additional object diversity regularization term in Equation ( 2). Look at Table 4 in Appendix for detailed hyper-parameters we used.