VLCounter: Text-Aware Visual Representation for Zero-Shot Object Counting
Authors: Seunggu Kang, WonJun Moon, Euiyeon Kim, Jae-Pil Heo
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments on FSC147, CARPK, and PUCPR+, we demonstrate the benefits of our end-to-end framework, VLCounter. Code is available at https://github.com/seunggu0305/VLCounter |
| Researcher Affiliation | Academia | Seunggu Kang, Won Jun Moon, Euiyeon Kim, Jae-Pil Heo* Sungkyunkwan University {seunggu35, wjun0830, keywi9811, jaepilheo}@g.skku.edu |
| Pseudocode | No | The paper includes architectural diagrams (e.g., Fig. 2, Fig. 3, Fig. 4) to illustrate its components and their interactions, but it does not provide any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present any procedure in a formal code-like format. |
| Open Source Code | Yes | Code is available at https://github.com/seunggu0305/VLCounter |
| Open Datasets | Yes | To explore the counting capability of models, we use FSC147 (Ranjan et al. 2021), the first large-scale dataset for class-agnostic counting. It includes 6135 images from 147 categories mainly composed of foods, animals, kitchen utensils, and vehicles. We also utilize CARPK and PUCPR+ (Hsieh, Lin, and Hsu 2017) datasets. |
| Dataset Splits | Yes | Through extensive experiments on FSC147, CARPK, and PUCPR+, we demonstrate the benefits of our end-to-end framework, VLCounter. ... Table 1: Quantitative comparison to state-of-the-art approaches on the FSC147 dataset. ... Val set MAE RMSE |
| Hardware Specification | Yes | We trained the model using Adam W (Loshchilov and Hutter 2017) optimizer with a learning rate of 1e 4 and weight decay of 1e 2 for 200 epochs with a batch size of 16 on a single NVIDIA RTX A6000. |
| Software Dependencies | No | The paper mentions using 'CLIP Vi T-B/16' as encoders and 'Adam W' optimizer, but it does not specify version numbers for general software dependencies or libraries such as Python, PyTorch/TensorFlow, or CUDA. |
| Experiment Setup | Yes | For all experiments, we employed CLIP Vi T-B/16 as our encoders followed by a decoder consisting of 4 repeated units. ... We trained the model using Adam W (Loshchilov and Hutter 2017) optimizer with a learning rate of 1e 4 and weight decay of 1e 2 for 200 epochs with a batch size of 16 on a single NVIDIA RTX A6000. For temperature scaling and loss-balancing hyperparameter λ and τ, we used 1e 6 and 1. |