ClusterFomer: Clustering As A Universal Visual Learner
Authors: James Liang, Yiming Cui, Qifan Wang, Tong Geng, Wenguan Wang, Dongfang Liu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results demonstrate that CLUSTERFORMER outperforms various well-known specialized architectures, achieving 83.41% top-1 acc. over Image Net-1K for image classification, 54.2% and 47.0% m AP over MS COCO for object detection and instance segmentation, 52.4% m Io U over ADE20K for semantic segmentation, and 55.8% PQ over COCO Panoptic for panoptic segmentation. |
| Researcher Affiliation | Collaboration | James C. Liang Rochester Institute of Technology Yiming Cui University of Florida Qifan Wang Meta AI Tong Geng University of Rochester Wenguan Wang Zhejiang University Dongfang Liu Rochester Institute of Technology |
| Pseudocode | No | The paper describes algorithmic steps and concepts through text and figures (e.g., Figure 2), but it does not contain structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The code will be available at here. |
| Open Datasets | Yes | Image Net-1K[72] includes high-resolution images spanning distinct categories (e.g., animals, plants, and vehicles). MS COCO for Object Detection and Instance Segmentation. COCO [49] dataset features dense annotations for 80 common objects in daily contexts. ADE20K for Semantic Segmentation. ADE20K [101] dataset offers an extensive collection of images with pixel-level annotations, containing 150 diverse object categories in both indoor and outdoor scenes. COCO Panoptic for Panoptic Segmentation. The COCO Panoptic dataset [42] includes 80 thing categories and a carefully annotated set of 53 stuff categories. |
| Dataset Splits | Yes | Following conventional procedures, the dataset is split into 1.2M/50K/100K images for train/validation/test splits. (ImageNet-1K); Following standard practices [49], the dataset is split into 115K/5K/20K images for train2017/val2017/test-dev splits. (MS COCO); The dataset comprises 20K/2K/3K images for train/val/test splits. (ADE20K); the COCO Panoptic dataset is split into 115K/5K/20K images for the train/val/test splits as well. (COCO Panoptic) |
| Hardware Specification | Yes | Models are trained from scratch on sixteen A100GPUs. |
| Software Dependencies | No | The paper states using 'mmclassification2', 'mmdetection3', and 'mmsegmentation4' as codebases, providing general GitHub links. However, it does not specify any version numbers for these frameworks, Python, PyTorch, or CUDA, which are necessary for a reproducible description of software dependencies. |
| Experiment Setup | Yes | To optimize the model s performance, we employ cross-entropy as the default loss function, which is widely used in classification tasks and helps in minimizing the difference between predicted probabilities and ground truth. For the training details, we run the model for 300 epochs, allowing sufficient time for the model to learn and converge. To manage the learning rate, we initialize it at 0.001 as default. The learning rate is then scheduled using a cosine annealing policy, which gradually decreases the learning rate over time. Due to limitations in our GPU capacity, we are constrained to set the total batch size at 1024. Models are trained from scratch on sixteen A100GPUs. The number of instances centers is set to 100; a linear combination of the L1 loss and the GIo U Loss is used as the optimization objective for bounding box regression. Their coefficients are set to 5 and 2, respectively. Moreover, we set the initial learning rate to 1e-5, the training epoch to 50, and the batch size to 16. We use random scale jittering with a factor in [0.1, 2.0] and a crop size of 1024 x 1024. |