reproducibilityindex.ai

ClusterFomer: Clustering As A Universal Visual Learner

Authors: James Liang, Yiming Cui, Qifan Wang, Tong Geng, Wenguan Wang, Dongfang Liu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results demonstrate that CLUSTERFORMER outperforms various well-known specialized architectures, achieving 83.41% top-1 acc. over Image Net-1K for image classification, 54.2% and 47.0% m AP over MS COCO for object detection and instance segmentation, 52.4% m Io U over ADE20K for semantic segmentation, and 55.8% PQ over COCO Panoptic for panoptic segmentation.
Researcher Affiliation	Collaboration	James C. Liang Rochester Institute of Technology Yiming Cui University of Florida Qifan Wang Meta AI Tong Geng University of Rochester Wenguan Wang Zhejiang University Dongfang Liu Rochester Institute of Technology
Pseudocode	No	The paper describes algorithmic steps and concepts through text and figures (e.g., Figure 2), but it does not contain structured pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The code will be available at here.
Open Datasets	Yes	Image Net-1K[72] includes high-resolution images spanning distinct categories (e.g., animals, plants, and vehicles). MS COCO for Object Detection and Instance Segmentation. COCO [49] dataset features dense annotations for 80 common objects in daily contexts. ADE20K for Semantic Segmentation. ADE20K [101] dataset offers an extensive collection of images with pixel-level annotations, containing 150 diverse object categories in both indoor and outdoor scenes. COCO Panoptic for Panoptic Segmentation. The COCO Panoptic dataset [42] includes 80 thing categories and a carefully annotated set of 53 stuff categories.
Dataset Splits	Yes	Following conventional procedures, the dataset is split into 1.2M/50K/100K images for train/validation/test splits. (ImageNet-1K); Following standard practices [49], the dataset is split into 115K/5K/20K images for train2017/val2017/test-dev splits. (MS COCO); The dataset comprises 20K/2K/3K images for train/val/test splits. (ADE20K); the COCO Panoptic dataset is split into 115K/5K/20K images for the train/val/test splits as well. (COCO Panoptic)
Hardware Specification	Yes	Models are trained from scratch on sixteen A100GPUs.
Software Dependencies	No	The paper states using 'mmclassification2', 'mmdetection3', and 'mmsegmentation4' as codebases, providing general GitHub links. However, it does not specify any version numbers for these frameworks, Python, PyTorch, or CUDA, which are necessary for a reproducible description of software dependencies.
Experiment Setup	Yes	To optimize the model s performance, we employ cross-entropy as the default loss function, which is widely used in classification tasks and helps in minimizing the difference between predicted probabilities and ground truth. For the training details, we run the model for 300 epochs, allowing sufficient time for the model to learn and converge. To manage the learning rate, we initialize it at 0.001 as default. The learning rate is then scheduled using a cosine annealing policy, which gradually decreases the learning rate over time. Due to limitations in our GPU capacity, we are constrained to set the total batch size at 1024. Models are trained from scratch on sixteen A100GPUs. The number of instances centers is set to 100; a linear combination of the L1 loss and the GIo U Loss is used as the optimization objective for bounding box regression. Their coefficients are set to 5 and 2, respectively. Moreover, we set the initial learning rate to 1e-5, the training epoch to 50, and the batch size to 16. We use random scale jittering with a factor in [0.1, 2.0] and a crop size of 1024 x 1024.