Prototypical Transformer As Unified Motion Learners
Authors: Cheng Han, Yawen Lu, Guohao Sun, James Chenhao Liang, Zhiwen Cao, Qifan Wang, Qiang Guan, Sohail Dianat, Raghuveer Rao, Tong Geng, Zhiqiang Tao, Dongfang Liu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct a set of comprehensive experiments to evaluate the effectiveness of our approach. In 4.1, Proto Former presents compelling results on optical flow. For example, our approach distinctly outperforms CRAFT, achieving 0.48 and 0.69 on the clean and final pass of Sintel, respectively. In 4.2, we further show the superior performance on depth scene estimation (e.g., 18.6% improvement in Sintel compared to Ada Bins). Also, visual evidence in 4.3 demonstrates the systemic explainability, which displays direct prototype-pixel correlations. Results on various downstream tasks including object tracking ( S5) and video stabilization ( S6) are detailed in the Appendix. |
| Researcher Affiliation | Collaboration | 1University of Missouri Kansas City 2Rochester Institute of Technology 3Purdue University 4META AI 5Kent State University 6DEVCOM Army Research Laboratory 7University of Rochester. |
| Pseudocode | Yes | S7. Pseudo-codes ... Algorithm 1 Pseudo-code of Cross-attention Prototyping in a Py Torch-like style. ... Algorithm 2 Pseudo-code of Latent Synchronization in a Py Torch-like style. |
| Open Source Code | Yes | Our code is available here. |
| Open Datasets | Yes | Initially, the model underwent a pre-training phase on the Flying Chairs dataset (Dosovitskiy et al., 2015), followed by an additional 120, 000 iterations on the Flying Things dataset (Mayer et al., 2016)... Subsequently, the model underwent fine-tuning on a combined dataset encompassing Flying Things (Mayer et al., 2016), Sintel (Butler et al., 2012b), KITTI-2015 (Geiger et al., 2013), and HD1K (Kondermann et al., 2016)... We initially adopt the VKITTI (Cabon et al., 2020) as a pretraining, and subsequently canonical Eigen split (Eigen et al., 2014) and MPI Sintel dataset (Butler et al., 2012b) to refine the model through fine-tuning... |
| Dataset Splits | Yes | As an autonomous driving dataset consisting of 61 outdoor scenes of various modalities, we use the KITTI Eigen depth split, which contains a standard depth estimation split proposed by Eigen et al. (Eigen et al., 2014) consisting of 32 scenes for training and 29 scenes for testing. |
| Hardware Specification | Yes | Experiments are conducted on eight NVIDIA A100-40GB GPUs. |
| Software Dependencies | No | Proto Former is implemented in Pytorch (Paszke et al., 2019). While PyTorch is mentioned, a specific version number for PyTorch or any other software dependency is not provided, making it not fully reproducible regarding software versions. |
| Experiment Setup | Yes | The training employed Adam W (Loshchilov & Hutter, 2019) optimizer and a one-cycle learning rate scheduler, with the peak learning rate set at 2.5 10 4 for the Flying Chairs dataset and 1.25 10 4 for the other datasets. ... Within each Cross Attention Prototyping layer, twenty prototypes and three iterations are conducted as default ( 4.3). |