CountGD: Multi-Modal Open-World Counting

Authors: Niki Amini-Naieni, Tengda Han, Andrew Zisserman

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental COUNTGD is trained on the FSC-147 [42] object counting dataset training set, and then evaluated on the FSC-147 test set, and two other benchmark datasets (without any fine-tuning). We first describe the datasets, and then discuss the performance.
Researcher Affiliation Academia Niki Amini-Naieni Tengda Han Andrew Zisserman Visual Geometry Group (VGG) University of Oxford {nikian,htd,az}@robots.ox.ac.uk
Pseudocode No The paper includes an architectural diagram (Figure 2) and describes the model components in detail, but it does not present any structured pseudocode or algorithm blocks.
Open Source Code Yes The code and an app to test the model are available at https://www.robots.ox.ac.uk/vgg/research/countgd/.
Open Datasets Yes COUNTGD is trained on the FSC-147 [42] object counting dataset training set, and then evaluated on the FSC-147 test set, and two other benchmark datasets (without any fine-tuning).
Dataset Splits Yes FSC-147 [42]. FSC-147 contains 6135 images with 89 classes in the training set, 29 classes in the validation set, and 29 classes in the test set.
Hardware Specification Yes Our model is trained on 1 Nvidia A6000 GPU with 48GB of graphic memory. A full training takes about 1 day.
Software Dependencies No The paper mentions using Python and standard libraries implicitly through the architecture (e.g., BERT-base, Swin Transformer, Adam optimizer) but does not provide explicit version numbers for these software dependencies (e.g., 'PyTorch 1.9', 'Python 3.8').
Experiment Setup Yes The model is trained for 30 epochs on the FSC-147 training dataset using Adam optimizer and standard augmentations. The image and text encoders, fθSwin T and fθTT, are frozen during training. ... The model is optimized with the Adam Optimizer with a weight decay set to 10 4 and an initial learning rate set to 1 10 4 that reduces by a factor of ten every ten epochs. λloc is set to 1 and λcls is set to 5 in Equation 3. These scale factors are also used in the Hungarian Matching Cost for matching ground truth points to predicted points. The confidence threshold σ is set to 0.23.