Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Scale Equalization for Multi-Level Feature Fusion
Authors: Bum Jun Kim, Sang Woo Kim
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments showed that adopting scale equalizers consistently improved the m Io U index across various target datasets, including ADE20K, PASCAL VOC 2012, and Cityscapes, as well as various decoder choices, including UPer Head, PSPHead, ASPPHead, Sep ASPPHead, and FCNHead. We observed that injecting scale equalizers into multi-stage feature fusion improved the m Io U index compared with the same models without scale equalization (Table 2). |
| Researcher Affiliation | Academia | Bum Jun Kim EMAIL Department of Electrical Engineering Pohang University of Science and Technology; Sang Woo Kim EMAIL Department of Electrical Engineering Pohang University of Science and Technology |
| Pseudocode | Yes | Algorithm 1 Efficient Implementation via Initialization |
| Open Source Code | No | The paper discusses modifications to existing architectures and their efficient implementation, but it does not provide a direct link or explicit statement about the release of their own source code for the described methodology. It mentions using 'MMSegmentation: Open MMLab Semantic Segmentation Toolbox and Benchmark. https://github.com/open-mmlab/mmsegmentation, 2020.' which is a third-party tool. |
| Open Datasets | Yes | Experiments showed that adopting scale equalizers consistently improved the m Io U index across various target datasets, including ADE20K, PASCAL VOC 2012, and Cityscapes, as well as various decoder choices, including UPer Head, PSPHead, ASPPHead, Sep ASPPHead, and FCNHead. The ADE20K dataset contains scene-centric images along with the corresponding segmentation labels. The same goes for the PASCAL VOC 2012 dataset with 21 categories, and we followed the augmented PASCAL VOC 2012 dataset. The Cityscapes dataset contains images of urban street scenes along with the corresponding segmentation labels. Using the KITTI dataset (Geiger et al., 2013), we trained the model with and without scale equalizers in the feature fusion module (Table 5). |
| Dataset Splits | No | The paper describes data augmentation and preprocessing steps (e.g., crop size, random resize, random flipping, photometric distortions) for the datasets used (ADE20K, PASCAL VOC 2012, Cityscapes, KITTI), but it does not explicitly state the training, validation, and test dataset splits by percentages or sample counts in the main text. |
| Hardware Specification | No | The training was conducted on a 4 GPU machine, and Sync BN (Zhang et al., 2018) was used for distributed training. |
| Software Dependencies | No | The paper mentions 'MMSegmentation (Contributors, 2020)' and various optimizers like 'Adam W' and 'stochastic gradient descent with momentum', along with 'Sync BN', but does not provide specific version numbers for any key software libraries, frameworks, or programming languages used (e.g., Python, PyTorch, CUDA, etc.). |
| Experiment Setup | Yes | For training with Swin and Twins encoders, Adam W optimizer (Loshchilov & Hutter, 2019) with weight decay 10e-2, betas β1 = 0.9, β2 = 0.999, and learning rate 6e-5 with polynomial decay of the 160K scheduler after linear warmup were used. For training with Conv Ne Xt encoders, Adam W optimizer with weight decay 5e-2, betas β1 = 0.9, β2 = 0.999, learning rate 10e-4 with polynomial decay of the 160K scheduler after linear warmup, and mixed precision training (Micikevicius et al., 2018) were used. The training was conducted on a 4 GPU machine, and Sync BN (Zhang et al., 2018) was used for distributed training. We measured the mean intersection over union (m Io U) and reported the average of five runs with different random seeds. For training on the Cityscapes dataset, stochastic gradient descent with momentum 0.9, weight decay 5e-4, and learning rate 10e-2 with polynomial decay of the 80K scheduler were used. |