PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences
Authors: Hehe Fan, Xin Yu, Yuhang Ding, Yi Yang, Mohan Kankanhalli
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on widely-used 3D action recognition and 4D semantic segmentation datasets demonstrate the effectiveness of PSTNet to model point cloud sequences. |
| Researcher Affiliation | Collaboration | Hehe Fan1, Xin Yu2, Yuhang Ding3, Yi Yang2 & Mohan Kankanhalli1 1School of Computing, National University of Singapore 2Re LER, University of Technology Sydney 3Baidu Research |
| Pseudocode | No | The paper describes the proposed convolutions and network architectures using mathematical formulations and textual descriptions, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states 'We have implemented our PSTNets using both Py Torch and Paddle Paddle, which have shown similar performance.' However, it does not provide an explicit statement about releasing the code or a link to a code repository. |
| Open Datasets | Yes | The MSR-Action3D (Li et al., 2010) dataset consists of 567 Kinect depth videos... The NTU RGB+D 60 (Shahroudy et al., 2016)... The NTU RGB+D 120 (Liu et al., 2019a)... Synthia 4D (Choy et al., 2019) uses the Synthia dataset (Ros et al., 2016)... then fine-tune the model on a KITTI scene flow dataset (Liu et al., 2019e). |
| Dataset Splits | Yes | For MSR-Action3D: 'We use the same training/test split as previous works (Wang et al., 2012; Liu et al., 2019e).' For NTU RGB+D: 'The dataset defines two types of evaluation, i.e., cross-subject and cross-view. (...) The cross-subject evaluation splits the 40 performers into training and test groups. Each group consists of 20 performers. The cross-view evaluation uses all the samples from camera 1 for testing and samples from cameras 2 and 3 for training.' For Synthia 4D: 'Following (Liu et al., 2019e), we use the same training/validation/test split, with 19,888/815/1,886 frames, respectively.' |
| Hardware Specification | Yes | Experiments are conducted using 1 Nvidia RTX 2080Ti GPU on NTU RGB+D 60. |
| Software Dependencies | No | The paper states 'We have implemented our PSTNets using both Py Torch and Paddle Paddle, which have shown similar performance.' However, it does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We train our models for 35 epochs with the SGD optimizer. Learning rate is set to 0.01, and decays with a rate of 0.1 at the 10th epoch and the 20th epoch, respectively. (...) batch size is set to 16 and frame sampling stride are set to and 1, respectively. We set the initial spatial search radius ro to 0.5. |