Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Prediction-Feedback DETR for Temporal Action Detection
Authors: Jihwan Kim, Miso Lee, Cheol-Ho Cho, Jihyun Lee, Jae-Pil Heo
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments with various challenging benchmarks including THUMOS14, Activity Net-v1.3, HACS, and Fine Action, we demonstrate that the proposed methods remarkably reduce the degree of the attention collapse problem. |
| Researcher Affiliation | Academia | Jihwan Kim, Miso Lee, Cheol-Ho Cho, Jihyun Lee, Jae-Pil Heo Sungkyunkwan University EMAIL |
| Pseudocode | No | The paper describes methods using mathematical formulations and descriptive text, but it does not contain a clearly labeled pseudocode or algorithm block. |
| Open Source Code | No | The paper does not explicitly state that source code is provided, nor does it include a link to a code repository. |
| Open Datasets | Yes | In this paper, we utilize four challenging benchmarks of temporal action detection: THUMOS14 (Jiang et al. 2014), Activity Net-v1.3 (Fabian Caba Heilbron and Niebles 2015), HACS (Zhao et al. 2019) and Fine Action (Liu et al. 2022c). |
| Dataset Splits | Yes | THUMOS14 has 200 and 213 videos for the training and validation sets, respectively. The dataset has 20 action classes related to sports. Activity Net-v1.3 contains 19,994 videos with 200 action classes. 10024, 4926, and 5044 videos are for training, validation, and testing, respectively. |
| Hardware Specification | No | The paper mentions using features from models like I3D, Slow Fast, and Video MAEv2g, but it does not specify the hardware (e.g., GPU, CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions being 'based on a temporal version of DAB-DETR' but does not specify any software names with version numbers for reproducibility (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | The number of layers of the encoder and decoder is 2, and 4, respectively. The number of the queries is 40. We set the weights λe SA, λd SA and , λd CA of the losses of the prediction-feedback for the encoder and decoder as 2. |