Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Prompt-guided Disentangled Representation for Action Recognition

Authors: tianci wu, Guangming Zhu, Lu jiang, Siyuan Wang, Ning Wang, Nuoye Xiong, Liang Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on two complex video action datasets, Charades and Sports HHI, demonstrate the effectiveness of our approach against state-of-the-art methods.
Researcher Affiliation Academia Tianci Wu, Guangming Zhu , Jiang Lu, Siyuan Wang, Ning Wang, Nuoye Xiong, Zhang Liang School of Computer Science and Technology, Xidian University EMAIL EMAIL
Pseudocode No The paper describes methods in paragraph text and equations, but no structured pseudocode or algorithm blocks are explicitly presented.
Open Source Code Yes Our code can be found in https://github.com/iamsnaping/Pro DA.git.
Open Datasets Yes Dataset. (1) The Charades dataset [40] consists of 9,848 videos... (2) The Sports HHI dataset [50] is a specially designed dataset focusing on human-human interaction in sports.
Dataset Splits Yes On Charades, all experiments are conducted under the same design. For each video, the network takes N frames as input. During training, frames are randomly sampled from the video, while during validation, the frames are extracted evenly over the whole video. [...] We conduct domain shift experiments on the Charades dataset to evaluate the generalization ability of our method. Specifically, we follow the experimental setup of [47], splitting Charades into five disjoint subsets with different action groups. The details are shown in Table 10.
Hardware Specification No Justification: Since some of the compared methods do not provide complete computational resource information (and some do not release their code), it is challenging for us to perform a fair comparison regarding computational cost.
Software Dependencies No The paper mentions various models and network types (e.g., CNN, ViT, GNN, Dual-AI, CLIP-B/16) and their underlying principles (e.g., self-attention) but does not provide specific software dependencies like programming language versions or library versions.
Experiment Setup Yes For each video, we uniformly sample 16 frames as input. We adopt a two-stage training strategy for all experiments. The first stage (pre-train) runs for 5 epochs with a learning rate of 2e-4, aiming to learn the representations for au and as. In the second stage (post-train), we load the checkpoint from the epoch with the best performance on au and as in the pre-training stage, and only train the classifier for at. This stage is also trained for 5 epochs with a learning rate of 1e-4.