CountGD: Multi-Modal Open-World Counting
Authors: Niki Amini-Naieni, Tengda Han, Andrew Zisserman
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | COUNTGD is trained on the FSC-147 [42] object counting dataset training set, and then evaluated on the FSC-147 test set, and two other benchmark datasets (without any fine-tuning). We first describe the datasets, and then discuss the performance. |
| Researcher Affiliation | Academia | Niki Amini-Naieni Tengda Han Andrew Zisserman Visual Geometry Group (VGG) University of Oxford {nikian,htd,az}@robots.ox.ac.uk |
| Pseudocode | No | The paper includes an architectural diagram (Figure 2) and describes the model components in detail, but it does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code and an app to test the model are available at https://www.robots.ox.ac.uk/vgg/research/countgd/. |
| Open Datasets | Yes | COUNTGD is trained on the FSC-147 [42] object counting dataset training set, and then evaluated on the FSC-147 test set, and two other benchmark datasets (without any fine-tuning). |
| Dataset Splits | Yes | FSC-147 [42]. FSC-147 contains 6135 images with 89 classes in the training set, 29 classes in the validation set, and 29 classes in the test set. |
| Hardware Specification | Yes | Our model is trained on 1 Nvidia A6000 GPU with 48GB of graphic memory. A full training takes about 1 day. |
| Software Dependencies | No | The paper mentions using Python and standard libraries implicitly through the architecture (e.g., BERT-base, Swin Transformer, Adam optimizer) but does not provide explicit version numbers for these software dependencies (e.g., 'PyTorch 1.9', 'Python 3.8'). |
| Experiment Setup | Yes | The model is trained for 30 epochs on the FSC-147 training dataset using Adam optimizer and standard augmentations. The image and text encoders, fθSwin T and fθTT, are frozen during training. ... The model is optimized with the Adam Optimizer with a weight decay set to 10 4 and an initial learning rate set to 1 10 4 that reduces by a factor of ten every ten epochs. λloc is set to 1 and λcls is set to 5 in Equation 3. These scale factors are also used in the Hungarian Matching Cost for matching ground truth points to predicted points. The confidence threshold σ is set to 0.23. |