reproducibilityindex.ai

VLCounter: Text-Aware Visual Representation for Zero-Shot Object Counting

Authors: Seunggu Kang, WonJun Moon, Euiyeon Kim, Jae-Pil Heo

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on FSC147, CARPK, and PUCPR+, we demonstrate the benefits of our end-to-end framework, VLCounter. Code is available at https://github.com/seunggu0305/VLCounter
Researcher Affiliation	Academia	Seunggu Kang, Won Jun Moon, Euiyeon Kim, Jae-Pil Heo* Sungkyunkwan University {seunggu35, wjun0830, keywi9811, jaepilheo}@g.skku.edu
Pseudocode	No	The paper includes architectural diagrams (e.g., Fig. 2, Fig. 3, Fig. 4) to illustrate its components and their interactions, but it does not provide any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present any procedure in a formal code-like format.
Open Source Code	Yes	Code is available at https://github.com/seunggu0305/VLCounter
Open Datasets	Yes	To explore the counting capability of models, we use FSC147 (Ranjan et al. 2021), the first large-scale dataset for class-agnostic counting. It includes 6135 images from 147 categories mainly composed of foods, animals, kitchen utensils, and vehicles. We also utilize CARPK and PUCPR+ (Hsieh, Lin, and Hsu 2017) datasets.
Dataset Splits	Yes	Through extensive experiments on FSC147, CARPK, and PUCPR+, we demonstrate the benefits of our end-to-end framework, VLCounter. ... Table 1: Quantitative comparison to state-of-the-art approaches on the FSC147 dataset. ... Val set MAE RMSE
Hardware Specification	Yes	We trained the model using Adam W (Loshchilov and Hutter 2017) optimizer with a learning rate of 1e 4 and weight decay of 1e 2 for 200 epochs with a batch size of 16 on a single NVIDIA RTX A6000.
Software Dependencies	No	The paper mentions using 'CLIP Vi T-B/16' as encoders and 'Adam W' optimizer, but it does not specify version numbers for general software dependencies or libraries such as Python, PyTorch/TensorFlow, or CUDA.
Experiment Setup	Yes	For all experiments, we employed CLIP Vi T-B/16 as our encoders followed by a decoder consisting of 4 repeated units. ... We trained the model using Adam W (Loshchilov and Hutter 2017) optimizer with a learning rate of 1e 4 and weight decay of 1e 2 for 200 epochs with a batch size of 16 on a single NVIDIA RTX A6000. For temperature scaling and loss-balancing hyperparameter λ and τ, we used 1e 6 and 1.