KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension
Authors: Jie Yang, Wang ZENG, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Ruimao Zhang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments demonstrate Kpt LLM s superiority in various keypoint detection benchmarks and its unique semantic capabilities in interpreting keypoints. |
| Researcher Affiliation | Collaboration | 1 Sun Yat-sen University 2The Chinese University of Hong Kong, Shenzhen 3The University of Hong Kong 4The Chinese University of Hong Kong 5 Sense Time Research and Tetras.AI |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code will be released at https://kptllm.github.io. |
| Open Datasets | Yes | In our experiments, we employ two datasets to evaluate the semantic keypoint comprehension in three scenarios: (1) The MP-100 dataset [21] for both Keypoint Semantic Undertanding and Visual Promptbased Keypoint Detection... (2) The AP-10K dataset [32] for Textual Prompt-based Keypoint Detection |
| Dataset Splits | Yes | The MP-100 dataset [21]... is divided into five distinct splits to ensure comprehensive coverage across different model training and validation scenarios. Each split contains all 100 categories, with 70 for training, 10 for validation, and 20 for testing. |
| Hardware Specification | Yes | We utilize 8 NVIDIA A100-80G GPUs for training, and use the Deep Speed engine to enhance training efficiency. |
| Software Dependencies | No | The paper mentions using LLaVA-V1.5-7B as the base model, LoRA, AdamW, and Deep Speed engine. However, it does not provide specific version numbers for these software components or libraries. |
| Experiment Setup | Yes | Lo RA parameters are configured with a rank of 128 and an alpha of 256. Optimization is conducted using Adam W, with a learning rate of 2e 4 and weight decay of 0. ... Each GPU operates with a batch size of 16, and we employ a gradient accumulation step of 1. |