VLCounter: Text-Aware Visual Representation for Zero-Shot Object Counting

Authors: Seunggu Kang, WonJun Moon, Euiyeon Kim, Jae-Pil Heo

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments on FSC147, CARPK, and PUCPR+, we demonstrate the benefits of our end-to-end framework, VLCounter. Code is available at https://github.com/seunggu0305/VLCounter
Researcher Affiliation Academia Seunggu Kang, Won Jun Moon, Euiyeon Kim, Jae-Pil Heo* Sungkyunkwan University {seunggu35, wjun0830, keywi9811, jaepilheo}@g.skku.edu
Pseudocode No The paper includes architectural diagrams (e.g., Fig. 2, Fig. 3, Fig. 4) to illustrate its components and their interactions, but it does not provide any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present any procedure in a formal code-like format.
Open Source Code Yes Code is available at https://github.com/seunggu0305/VLCounter
Open Datasets Yes To explore the counting capability of models, we use FSC147 (Ranjan et al. 2021), the first large-scale dataset for class-agnostic counting. It includes 6135 images from 147 categories mainly composed of foods, animals, kitchen utensils, and vehicles. We also utilize CARPK and PUCPR+ (Hsieh, Lin, and Hsu 2017) datasets.
Dataset Splits Yes Through extensive experiments on FSC147, CARPK, and PUCPR+, we demonstrate the benefits of our end-to-end framework, VLCounter. ... Table 1: Quantitative comparison to state-of-the-art approaches on the FSC147 dataset. ... Val set MAE RMSE
Hardware Specification Yes We trained the model using Adam W (Loshchilov and Hutter 2017) optimizer with a learning rate of 1e 4 and weight decay of 1e 2 for 200 epochs with a batch size of 16 on a single NVIDIA RTX A6000.
Software Dependencies No The paper mentions using 'CLIP Vi T-B/16' as encoders and 'Adam W' optimizer, but it does not specify version numbers for general software dependencies or libraries such as Python, PyTorch/TensorFlow, or CUDA.
Experiment Setup Yes For all experiments, we employed CLIP Vi T-B/16 as our encoders followed by a decoder consisting of 4 repeated units. ... We trained the model using Adam W (Loshchilov and Hutter 2017) optimizer with a learning rate of 1e 4 and weight decay of 1e 2 for 200 epochs with a batch size of 16 on a single NVIDIA RTX A6000. For temperature scaling and loss-balancing hyperparameter λ and τ, we used 1e 6 and 1.