GTR: A Grafting-Then-Reassembling Framework for Dynamic Scene Graph Generation

Authors: Jiafeng Liang, Yuxin Wang, Zekun Wang, Ming Liu, Ruiji Fu, Zhongyuan Wang, Bing Qin

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that GTR achieves the state-of-the-art performance on Action Genome dataset. Further analyses reveal that the reassembling stage is crucial to the success of our framework. To evaluate the performance of the proposed framework, we conduct extensive experiments on Action Genome [Ji et al., 2020].
Researcher Affiliation Collaboration 1Harbin Institute of Technology, Harbin, China 2Peng Cheng Laboratory, Shenzhen, China 3Kuaishou Technology, Beijing, China
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements or links indicating the release of source code for the described methodology.
Open Datasets Yes We train and evaluate our model on the Action Genome (AG) dataset [Ji et al., 2020]. Action Genome [Ji et al., 2020], which describes relationships over time.
Dataset Splits No The paper mentions using a percentage of video data for training (e.g., 60%) but does not specify the explicit training, validation, and test dataset splits by percentages or counts for reproducibility.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types).
Software Dependencies No The paper mentions using Faster R-CNN with ResNet-101 and the RelTR model, but does not specify version numbers for any software dependencies like programming languages, libraries, or frameworks.
Experiment Setup Yes Parameter Settings. For the feature detector, we map the visual features to a vector of dimension 512 and the semantic features of the object categories to a vector of dimension 300. The MLPs in the paper are three-layer fully connected network and the hidden layer dimension is set to 512. Training Details. In the grafting stage, we adopt the original Rel TR model [Cong et al., 2022], changing only the output number of the classifier. The Action Genome dataset [Ji et al., 2020] is converted to COCO-format for fine-tuning. The Rel TR model is fine-tuned for total 20 epochs with mini-batch size 8 in this stage. The initial learning rates of the classifier are unchanged and the learning rates of the other layers are multiplied by 0.9 of the initial learning rate. In the reassembling stage, we train the temporal dependency model (TDM) by SGD optimizer for total 15 epochs with mini-batch size 1. The initial learning rate is set to 1e-5 and adjusted to 5e-6 after the 5 epochs of training and to 1e-6 after 10 epochs of training. In the noise filter (NFT), we set the similarity threshold to 0.9.