Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Action Dubber: Timing Audible Actions via Inflectional Flow
Authors: Wenlong Wan, Weiying Zheng, Tianyi Xiang, Guiqing Li, Shengfeng He
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments confirm the effectiveness of our approach on Audible623 and show strong generalizability to other domains, such as repetitive counting and sound source localization. |
| Researcher Affiliation | Academia | 1School of Computer Science and Engineering, South China University of Technology 2School of Computing and Information Systems, Singapore Management University 3School of Computing and Data Science, University of Hong Kong 4Department of Computer Science, City University of Hong Kong. |
| Pseudocode | No | The paper describes the method using text and figures such as Figure 4: Overview of TA2Net, but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and dataset are available at https:// github.com/Wenlong Wan/Audible623. |
| Open Datasets | Yes | To support this task, we introduce a new benchmark dataset, Audible623, derived from Kinetics and UCF101 by removing non-essential vocalization subsets. Extensive experiments confirm the effectiveness of our approach on Audible623 and show strong generalizability to other domains, such as repetitive counting and sound source localization. Code and dataset are available at https:// github.com/Wenlong Wan/Audible623. |
| Dataset Splits | Yes | After collecting and annotating the action videos, we obtain a total of 623 videos and allocate 497 videos for training and 126 videos for evaluation. |
| Hardware Specification | Yes | All experiments were conducted on a single NVIDIA A800 GPU with 80GB of memory, under Ubuntu 20.04 as the operating system. |
| Software Dependencies | Yes | We implement our method on Py Torch with CUDA, version 1.13. |
| Experiment Setup | Yes | During training, we randomly sample 64 frames per video, resizing them to 112x112 pixels. The model is trained using the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 5e-6, a batch size of 4, and 20k iterations. |