4D Panoptic Scene Graph Generation
Authors: Jingkang Yang, Jun CEN, WENXUAN PENG, Shuai Liu, Fangzhou Hong, Xiangtai Li, Kaiyang Zhou, Qifeng Chen, Ziwei Liu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the new dataset show that our method can serve as a strong baseline for future research on PSG-4D. Table 2 presents the results of experiments conducted on the PSG-4D dataset. |
| Researcher Affiliation | Academia | Jingkang Yang1, Jun Cen2, Wenxuan Peng1, Shuai Liu3, Fangzhou Hong1, Xiangtai Li1, Kaiyang Zhou4, Qifeng Chen2, Ziwei Liu1 1S-Lab, Nanyang Technological University 2The Hong Kong University of Science and Technology 3Beijing University of Posts and Telecommunications 4Hong Kong Baptist University |
| Pseudocode | No | The paper describes the methodology in detail and includes figures, but it does not contain any structured pseudocode or algorithm blocks labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | Open-Source Codebase: We open-source our codebase to facilitate future PSG-4D research. https://github.com/Jingkang50/PSG4D |
| Open Datasets | Yes | To facilitate research on this new task, we contribute an extensively annotated PSG-4D dataset that is composed of 2 sub-sets, PSG4D-GTA and PSG4D-HOI. The PSG4D-GTA subset consists of 67 RGB-D videos with a total of 28K frames, selected from the SAIL-VOS 3D dataset [20] collected from the video game Grand Theft Auto V (GTA-V) [21]. The PSG4D-HOI subset is a collection of 3K egocentric real-world videos sampled from the HOI4D dataset [22]. |
| Dataset Splits | No | The paper mentions 'training duration' and 'training set for relation training' but does not explicitly provide specific percentages, sample counts, or methodology for training, validation, or test dataset splits. |
| Hardware Specification | No | The paper mentions the components of a demo robot but does not provide specific details about the hardware (e.g., GPU models, CPU types, or memory) used for conducting the main experiments or training the models. |
| Software Dependencies | No | The paper mentions several software components and models like Mask2Former, DKNet, Uni Track, and GPT-4, but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We set the training duration to 12 epochs. The DKNet, trained from scratch, requires a longer training period of 200 epochs. In the second stage, both spatial and temporal transformer encoders span two layers, and training continues for an additional 100 epochs. |