Superclass-Conditional Gaussian Mixture Model For Learning Fine-Grained Embeddings

Authors: Jingchao Ni, Wei Cheng, Zhengzhang Chen, Takayoshi Asakura, Tomoya Soma, Sho Kato, Haifeng Chen

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on benchmark datasets and a real-life medical dataset indicate the effectiveness of our method.
Researcher Affiliation Industry Jingchao Ni1, Wei Cheng1, Zhengzhang Chen1, Takayoshi Asakura2, Tomoya Soma2, Sho Kato3, Haifeng Chen1 1NEC Laboratories America, 2NEC Corporation, 3Renascience, Inc.
Pseudocode Yes Algorithm 1: Superclass-conditional Gaussian mixture model (SCGM)
Open Source Code Yes The code of SCGM is available at https://github.com/nijingchao/SCGM for reproducibility study.
Open Datasets Yes The table below summarizes the benchmark datasets: (1) BREEDS (Santurkar et al., 2020) includes four datasets {Living17, Nonliving26, Entity13, Entity30} derived from Image Net with class hierarchy calibrated...; (2) CIFAR-100 (Krizhevsky, 2009); and (3) tiered Image Net (Ren et al., 2018)
Dataset Splits Yes For BREEDS and CIFAR-100, the val set is 10% of the train set. ...tiered Image Net... was divided into 20/6/8 splits for (disjoint) train/val/test sets.
Hardware Specification Yes Table 4: Computational costs on 4 Quadro RTX6000 24G GPUs.
Software Dependencies No The paper does not provide specific version numbers for software dependencies such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes For training, we used cosine annealing with warm restarts schedule (Loshchilov & Hutter, 2017) with 20 epochs per cycle. The batch size was 256 for BREEDS, 1024 for CIFAR-100, and 512 for tiered Image Net. The learning rate was 0.03 for BREEDs, and 0.12 for CIFAR-100 and tiered Image Net. The weight decay was 1e 4. All models were trained with 200 epochs. ... For SCGM, we set γ = 0.5, σ2 = 0.1, and λ = 25 (λ follows (Asano et al., 2020)).