Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

PC-Net: Weakly Supervised Compositional Moment Retrieval via Proposal-Centric Network

Authors: Mingyao Zhou, Hao Sun, Wei Xie, Ming Dong, Chengji Wang, Mang Ye

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments
Researcher Affiliation Academia 1School of Computer Science, Central China Normal University 2Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University 3National Language Resources Monitoring and Research Center for Network Media, Central China Normal University 4School of Computer Science, Wuhan University Corresponding author: EMAIL
Pseudocode No The paper describes its methodology using text and mathematical equations in sections like '3.2 Dual-granularity Proposal Generator' and '3.3 Proposal Feature Aggregator with Semantic Alignment', but does not include explicit pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/mingyao1120/PC-Net.
Open Datasets Yes To validate the proposed PC-Net for the WSCMR task, Charades-CG and Activity Net-CG are applied, where accurate timestamp annotations are only used for evaluating the model rather than training, and annotations with novel queries are sourced from the literature [9]. Charades-CG (8,312 videos) includes 8,281 training queries and three test subsets: Test-Trivial (3,096 queries with training-style phrases), Novel-Composition (3,442 queries covering verb-noun, noun-noun, verb-adverb, adjective-noun, and preposition-noun combinations [9]), and Novel-Word (703 queries with unseen vocabulary). Activity Net-CG (20,647 videos) follows a similar split: 36,724 training queries, 15,712 Test-Trivial, 12,028 Novel-Composition, and 3,944 Novel-Word.
Dataset Splits Yes Charades-CG (8,312 videos) includes 8,281 training queries and three test subsets: Test-Trivial (3,096 queries with training-style phrases), Novel-Composition (3,442 queries covering verb-noun, noun-noun, verb-adverb, adjective-noun, and preposition-noun combinations [9]), and Novel-Word (703 queries with unseen vocabulary). Activity Net-CG (20,647 videos) follows a similar split: 36,724 training queries, 15,712 Test-Trivial, 12,028 Novel-Composition, and 3,944 Novel-Word.
Hardware Specification Yes All experiments are conducted on a single NVIDIA Ge Force RTX 4090 GPU.
Software Dependencies No We use GloVe [43] to extract textual features with a hidden dimension of 300. For video features, I3D [44] is used for Charades-CG and C3D [45] for Activity Net-CG, yielding feature dimensions of 1024 and 500, respectively.
Experiment Setup Yes The number of proposals is set to 8, and the slot attention module is iterated 4 times. The initial value of the proposal fusion coefficient α is 0.2. The loss coefficient for cross-modal semantic alignment is 0.5, and the margin quality of contrastive loss is 0.146, consistent with θ2 [7]. We use a batch size of 32 and train for 30 epochs.