4D Panoptic Scene Graph Generation

Authors: Jingkang Yang, Jun CEN, WENXUAN PENG, Shuai Liu, Fangzhou Hong, Xiangtai Li, Kaiyang Zhou, Qifeng Chen, Ziwei Liu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the new dataset show that our method can serve as a strong baseline for future research on PSG-4D. Table 2 presents the results of experiments conducted on the PSG-4D dataset.
Researcher Affiliation Academia Jingkang Yang1, Jun Cen2, Wenxuan Peng1, Shuai Liu3, Fangzhou Hong1, Xiangtai Li1, Kaiyang Zhou4, Qifeng Chen2, Ziwei Liu1 1S-Lab, Nanyang Technological University 2The Hong Kong University of Science and Technology 3Beijing University of Posts and Telecommunications 4Hong Kong Baptist University
Pseudocode No The paper describes the methodology in detail and includes figures, but it does not contain any structured pseudocode or algorithm blocks labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Open-Source Codebase: We open-source our codebase to facilitate future PSG-4D research. https://github.com/Jingkang50/PSG4D
Open Datasets Yes To facilitate research on this new task, we contribute an extensively annotated PSG-4D dataset that is composed of 2 sub-sets, PSG4D-GTA and PSG4D-HOI. The PSG4D-GTA subset consists of 67 RGB-D videos with a total of 28K frames, selected from the SAIL-VOS 3D dataset [20] collected from the video game Grand Theft Auto V (GTA-V) [21]. The PSG4D-HOI subset is a collection of 3K egocentric real-world videos sampled from the HOI4D dataset [22].
Dataset Splits No The paper mentions 'training duration' and 'training set for relation training' but does not explicitly provide specific percentages, sample counts, or methodology for training, validation, or test dataset splits.
Hardware Specification No The paper mentions the components of a demo robot but does not provide specific details about the hardware (e.g., GPU models, CPU types, or memory) used for conducting the main experiments or training the models.
Software Dependencies No The paper mentions several software components and models like Mask2Former, DKNet, Uni Track, and GPT-4, but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes We set the training duration to 12 epochs. The DKNet, trained from scratch, requires a longer training period of 200 epochs. In the second stage, both spatial and temporal transformer encoders span two layers, and training continues for an additional 100 epochs.