Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Disentangled Concepts Speak Louder Than Words: Explainable Video Action Recognition

Authors: Jongseo Lee, Wooil Lee, Gyeong-Moon Park, Seong Tae Kim, Jinwoo Choi

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on four datasets KTH, Penn Action, HAA500, and UCF101 demonstrate that DANCE significantly improves explanation clarity with competitive performance. We validate the superior interpretability of DANCE through a user study. Experimental results also show that DANCE is beneficial for model debugging, editing, and failure analysis. 4 Experimental Results In this section, carefully design and conduct rigorous experiments to answer the following research questions: (1) Does DANCE generate explanations that are easy for humans to interpret in the context of action prediction? (Section 4.1) (2) Can DANCE detect changes in the temporal domain, such as reversed input sequences? (Section 4.1) (3) What is the performance trade-off, if any, when interpretability is introduced into a previously non-interpretable model? (Section 4.2) (4) Can DANCE be effectively used for model debugging and editing? (Section 4.3)
Researcher Affiliation Academia Jongseo Lee1 Wooil Lee1 Gyeong-Moon Park2 Seong Tae Kim1 Jinwoo Choi1 1Kyung Hee University, Republic of Korea 2Korea University, Republic of Korea EMAIL, EMAIL
Pseudocode No The paper describes methods in text and figures (e.g., Figure 2, Figure 3) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Our project page is available at https://jong980812.github.io/DANCE/. Justification: We provide code in the supplementary material.
Open Datasets Yes Experiments on four datasets KTH [45], Penn Action [61], HAA500 [9], and UCF101 [49] demonstrate that DANCE significantly improves explanation clarity with competitive performance. We conduct experiments on four video action recognition datasets: KTH [45], Penn Action [61], HAA500 [9], and UCF-101 [49].
Dataset Splits Yes For more details on the dataset and implementation, please refer to the supplementary materials. Justification: The paper specifies all the necessary training and test details, including data splits, hyperparameters, and the type of optimizer used. These details are presented in the supplementary materials.
Hardware Specification Yes Justification: We report compute resource details including GPU type, memory, and per-experiment training time in the supplementary materials. This includes estimates of training time for each dataset, hardware specifications, ensuring reproducibility.
Software Dependencies No For each key clip Vs i, we apply a 2D pose estimation model [59] to every frame to obtain a pose sequence Ps i RL J 2, where J is the number of joints. For each action class, we query GPT-4o [19] with two prompts: To avoid manual concept annotation, we employ a vision-language dual encoder [57] to generate concept pseudo labels for each training video Vi. Specific version numbers for these software components or other libraries (e.g., Python, PyTorch) are not provided in the main text.
Experiment Setup No For more details on the dataset and implementation, please refer to the supplementary materials. Justification: The paper specifies all the necessary training and test details, including data splits, hyperparameters, and the type of optimizer used. These details are presented in the supplementary materials. The main paper mentions 'lambda and alpha are balancing hyperparameters' but does not provide their specific values in the main text.