KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension

Authors: Jie Yang, Wang ZENG, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Ruimao Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments demonstrate Kpt LLM s superiority in various keypoint detection benchmarks and its unique semantic capabilities in interpreting keypoints.
Researcher Affiliation Collaboration 1 Sun Yat-sen University 2The Chinese University of Hong Kong, Shenzhen 3The University of Hong Kong 4The Chinese University of Hong Kong 5 Sense Time Research and Tetras.AI
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our code will be released at https://kptllm.github.io.
Open Datasets Yes In our experiments, we employ two datasets to evaluate the semantic keypoint comprehension in three scenarios: (1) The MP-100 dataset [21] for both Keypoint Semantic Undertanding and Visual Promptbased Keypoint Detection... (2) The AP-10K dataset [32] for Textual Prompt-based Keypoint Detection
Dataset Splits Yes The MP-100 dataset [21]... is divided into five distinct splits to ensure comprehensive coverage across different model training and validation scenarios. Each split contains all 100 categories, with 70 for training, 10 for validation, and 20 for testing.
Hardware Specification Yes We utilize 8 NVIDIA A100-80G GPUs for training, and use the Deep Speed engine to enhance training efficiency.
Software Dependencies No The paper mentions using LLaVA-V1.5-7B as the base model, LoRA, AdamW, and Deep Speed engine. However, it does not provide specific version numbers for these software components or libraries.
Experiment Setup Yes Lo RA parameters are configured with a rank of 128 and an alpha of 256. Optimization is conducted using Adam W, with a learning rate of 2e 4 and weight decay of 0. ... Each GPU operates with a batch size of 16, and we employ a gradient accumulation step of 1.