Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CountGD: Multi-Modal Open-World Counting

Authors: Niki Amini-Naieni, Tengda Han, Andrew Zisserman

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental COUNTGD is trained on the FSC-147 [42] object counting dataset training set, and then evaluated on the FSC-147 test set, and two other benchmark datasets (without any fine-tuning). We first describe the datasets, and then discuss the performance.
Researcher Affiliation Academia Niki Amini-Naieni Tengda Han Andrew Zisserman Visual Geometry Group (VGG) University of Oxford EMAIL
Pseudocode No The paper includes an architectural diagram (Figure 2) and describes the model components in detail, but it does not present any structured pseudocode or algorithm blocks.
Open Source Code Yes The code and an app to test the model are available at https://www.robots.ox.ac.uk/vgg/research/countgd/.
Open Datasets Yes COUNTGD is trained on the FSC-147 [42] object counting dataset training set, and then evaluated on the FSC-147 test set, and two other benchmark datasets (without any fine-tuning).
Dataset Splits Yes FSC-147 [42]. FSC-147 contains 6135 images with 89 classes in the training set, 29 classes in the validation set, and 29 classes in the test set.
Hardware Specification Yes Our model is trained on 1 Nvidia A6000 GPU with 48GB of graphic memory. A full training takes about 1 day.
Software Dependencies No The paper mentions using Python and standard libraries implicitly through the architecture (e.g., BERT-base, Swin Transformer, Adam optimizer) but does not provide explicit version numbers for these software dependencies (e.g., 'PyTorch 1.9', 'Python 3.8').
Experiment Setup Yes The model is trained for 30 epochs on the FSC-147 training dataset using Adam optimizer and standard augmentations. The image and text encoders, fθSwin T and fθTT, are frozen during training. ... The model is optimized with the Adam Optimizer with a weight decay set to 10 4 and an initial learning rate set to 1 10 4 that reduces by a factor of ten every ten epochs. λloc is set to 1 and λcls is set to 5 in Equation 3. These scale factors are also used in the Hungarian Matching Cost for matching ground truth points to predicted points. The confidence threshold σ is set to 0.23.