Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning Skill-Attributes for Transferable Assessment in Video

Authors: Kumar Ashutosh, Kristen Grauman

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the new model on multiple datasets for both cross-sport (transfer) and intra-sport (in-domain) settings, where it achieves gains up to 60% relative to the state of the art. By abstracting out the shared behaviors indicative of human skill, the proposed video representation generalizes substantially better than an array of existing techniques, enriching today s multimodal large language models.
Researcher Affiliation	Academia	Kumar Ashutosh Univeristy of Texas at Austin Kristen Grauman University of Texas at Austin
Pseudocode	No	The paper describes the methodology in prose, with no explicit sections or figures labeled 'Pseudocode' or 'Algorithm'.
Open Source Code	No	Answer: [No] Justification: [No] The code and data will be released upon acceptance.
Open Datasets	Yes	We validate our ideas on three diverse datasets: Ego-Exo4D [37], which contains soccer, basketball, and rock climbing; QEVD [68], which contains fitness exercises; and in-the-wild You Tube videos of people tutoring physical activities.
Dataset Splits	Yes	Train/test splits. Our approach aims to improve both traditional in-domain and zero-shot skill assessment. The datasets organize video clips by their superclass sport and their subclass skill. A skill is a drill or specific exercise, and each sport can have multiple skills. Ego-Exo4D has 3 superclasses and 5 subclasses: soccer has skills dribbling and penalty kick; basketball has skills Mikan layup, reverse layup, jump shot; rock climbing is not sorted into skills. QEVD has 1 superclass (fitness) and 23 subclass skills (jumping jacks, squats, etc.). To explore models generalization ability, we perform controlled experiments with the following train/test settings, in decreasing volume of available training data (see Fig. 3 (right)): Fully supervised (FS): Train on all sports and skills, and test on held-out set of videos. This represents in-domain testing. All sport zero-shot (ZS-1): Train on all sports and skills, except the target skill. This means other skills from the same sport are seen during training. Familiar sport zero-shot (ZS-2): Train on all skills from the same sport, except the target skill. This means only the same sport is seen during training. Novel sport zero-shot (ZS-3): Train only on n-1 sports, test on skills from the n-th unseen sport. This means other skills from the same sport are not seen during training.
Hardware Specification	Yes	All experiments are performed on one GH200 NVIDIA node.
Software Dependencies	No	The paper mentions specific models like Ego VLPv2, CLIP, Llama-3.1-8B-Instruct [5], and Lo RA [38] but does not provide specific version numbers for general software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	Training details. We train both Fa and Ft in Lo RA setting [38], with rank 128, alpha 256, and dropout 0.05, for efficiency. The best performance is obtained with a learning rate of 2e-3 for fm and 2e-4 for L. Recall that fv is kept frozen. The model is trained for 2 epochs or till convergence. Total training time depends on the dataset setting, varying between 1-3 hours.