Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Part-Aware Bottom-Up Group Reasoning for Fine-Grained Social Interaction Detection
Authors: Dongkeun Kim, Minsu Cho, Suha Kwak
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the NVI dataset demonstrate that our method outperforms prior methods, achieving the new state of the art, while additional results on the Café dataset further validate its generalizability to group activity understanding. We evaluated the proposed method on the NVI [51] and Café [24], where it demonstrated substantial improvements over existing methods. |
| Researcher Affiliation | Academia | Dongkeun Kim Minsu Cho Suha Kwak Pohang University of Science and Technology (POSTECH) EMAIL |
| Pseudocode | No | The paper describes the method and architecture using descriptive text and figures (Figure 1, Figure S1), but does not contain a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We provide the code and the instructions in the supplementary material. |
| Open Datasets | Yes | We evaluated the proposed method on the NVI [51] and Café [24], where it demonstrated substantial improvements over existing methods. Dataset. To verify the proposed method across diverse social scenarios, we evaluated on two benchmarks: NVI [51] and Café [24]. |
| Dataset Splits | Yes | NVI contains 13,711 images, with 9,634 for training, 1,418 for validation, and 2,659 for test. |
| Hardware Specification | Yes | All experiments are conducted on four NVIDIA Ge Force RTX 3090 GPUs. |
| Software Dependencies | No | We implement our model using Py Torch [39] and utilize the official code repository of NVI [51], licensed under the MIT License. While PyTorch is mentioned, a specific version number for PyTorch or any other libraries is not provided. |
| Experiment Setup | Yes | Hyperparameters. Our model is initialized with the pretrained DETR Res Net-50. The feature dimension C and the transformer dimension D are set to 2048 and 256, respectively. The encoder consists of 6 layers with 8 attention heads, while the individual decoder, individual embedding enhancer, and group decoder comprise 3 layers with 8 attention heads. The number of individual queries, group queries, and parts are 24, 32, and 13, respectively. The NMS threshold is set to 0.5. Training. We train our model for 90 epochs using the Adam W optimizer [35] with β1 = 0.9, β2 = 0.999, and ϵ = 1e 8. The learning rate is set to 1e 4 initially and decayed to 1e 5 after 60 epochs. We use a mini-batch size of 4. Loss coefficients are set to λi = 1.0, λc = 2.0, λl = 1.0, λℓ1 = 2.5, λGIo U = 1.0, λp = 10.0, and λa = 5.0. |