reproducibilityindex.ai

Zero-shot Generalizable Incremental Learning for Vision-Language Object Detection

Authors: Jieren Deng, Haojian Zhang, Kun Ding, Jianhua Hu, Xingxuan Zhang, Yunkuan Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments on COCO and ODin W-13 datasets demonstrate that Zi Ra effectively safeguards the zeroshot generalization ability of VLODMs while continuously adapting to new tasks.
Researcher Affiliation	Academia	Jieren Deng1, 2, Haojian Zhang 1, Kun Ding1, Jianhua Hu1, Xingxuan Zhang3, and Yunkuan Wang1 1Institute of Automation, Chinese Academy of Sciences (CAS) {dengjieren2019, jianhua.hu, zhanghaojian2014, yunkuan.wang}@ia.ac.cn 2School of Artificial Intelligence, University of Chinese Academy of Science, UCAS 3Shanghai Sixth People s Hospital Affiliated to Shanghai Jiao Tong University School of Medicine zhangxingxuan@sjtu.edu.cn
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks. It describes the method and architecture using text and diagrams, but no formal pseudocode.
Open Source Code	Yes	Our code is available at https://github.com/Jarintotion Din/Zi Ra Grounding DINO.
Open Datasets	Yes	Datasets. We conduct our experiments on the COCO [21] datasets and the Object Detection in the Wild (ODin W) [18] benchmark. ODin W is a more challenging benchmark designed to test model performance under real-world scenarios. It comprises numerous sub-datasets from various domains for evaluation, such as Thermal (to detect objects in heat map images) and Aquarium (to detect different marine animals). Following GLIP [19], we use ODin W-13 datasets, they are labeled as Ae (Aerial Maritime Drone), Aq (Aquarium), Co (Cottontail Rabbits), Eg (Egohands), Mu (Mushrooms), Pa (Packages), Pv (Pascal VOC), Pi (Pistols), Po (Pothole), Ra (Raccoon), Sh (Shellfish), Th (Thermal Dogs and People), Ve (Vehicles). The 13 sub-datasets of ODin W-13 are trained sequentially, one by one, and are tested after all sub-datasets have been trained.
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, and testing within each dataset used. It states that the ODin W-13 sub-datasets are trained sequentially and tested after all sub-datasets have been trained, but not the internal splits.
Hardware Specification	Yes	Our proposed method is implemented with Py Torch and trained on two Nvidia RTX 3090 GPUs.
Software Dependencies	No	The paper mentions 'implemented with Py Torch' and 'Adam W is used as the optimizer,' but it does not specify version numbers for PyTorch or any other software dependencies needed to replicate the experiment.
Experiment Setup	Yes	Each downstream task is trained for a total of two epochs with a batch size of 2. For Grounding DINO, we employ an initial learning rate of 10-3, which decays to 0.1 times the original value after the first epoch to ensure effective convergence. For OV-DINO, we employ an initial learning rate of 10-4, which also decays to 0.1 times the original value after the first epoch to ensure effective convergence. Adam W is used as the optimizer, and the weight decay is 10-4.