reproducibilityindex.ai

KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension

Authors: Jie Yang, Wang ZENG, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Ruimao Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experiments demonstrate Kpt LLM s superiority in various keypoint detection benchmarks and its unique semantic capabilities in interpreting keypoints.
Researcher Affiliation	Collaboration	1 Sun Yat-sen University 2The Chinese University of Hong Kong, Shenzhen 3The University of Hong Kong 4The Chinese University of Hong Kong 5 Sense Time Research and Tetras.AI
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code will be released at https://kptllm.github.io.
Open Datasets	Yes	In our experiments, we employ two datasets to evaluate the semantic keypoint comprehension in three scenarios: (1) The MP-100 dataset [21] for both Keypoint Semantic Undertanding and Visual Promptbased Keypoint Detection... (2) The AP-10K dataset [32] for Textual Prompt-based Keypoint Detection
Dataset Splits	Yes	The MP-100 dataset [21]... is divided into five distinct splits to ensure comprehensive coverage across different model training and validation scenarios. Each split contains all 100 categories, with 70 for training, 10 for validation, and 20 for testing.
Hardware Specification	Yes	We utilize 8 NVIDIA A100-80G GPUs for training, and use the Deep Speed engine to enhance training efficiency.
Software Dependencies	No	The paper mentions using LLaVA-V1.5-7B as the base model, LoRA, AdamW, and Deep Speed engine. However, it does not provide specific version numbers for these software components or libraries.
Experiment Setup	Yes	Lo RA parameters are configured with a rank of 128 and an alpha of 256. Optimization is conducted using Adam W, with a learning rate of 2e 4 and weight decay of 0. ... Each GPU operates with a batch size of 16, and we employ a gradient accumulation step of 1.