Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Rectified Point Flow: Generic Point Cloud Pose Estimation

Authors: Tao Sun, Liyuan Zhu, Shengyu Huang, Shuran Song, Iro Armeni

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present Rectified Point Flow, a unified parameterization that formulates pairwise point cloud registration and multi-part shape assembly as a single conditional generative problem. Given unposed point clouds, our method learns a continuous point-wise velocity field that transports noisy points toward their target positions, from which part poses are recovered. In contrast to prior work that regresses partwise poses with ad-hoc symmetry handling, our method intrinsically learns assembly symmetries without symmetry labels. Together with an overlap-aware encoder focused on inter-part contacts, Rectified Point Flow achieves a new state-of-the-art performance on six benchmarks spanning pairwise registration and shape assembly. Notably, our unified formulation enables effective joint training on diverse datasets, facilitating the learning of shared geometric priors and consequently boosting accuracy. Our code and models are available at https://rectified-pointflow.github.io/.
Researcher Affiliation Collaboration Tao Sun Stanford University Liyuan Zhu Stanford University Shengyu Huang NVIDIA Research Shuran Song Stanford University Iro Armeni Stanford University
Pseudocode No The paper describes the methodology in text and uses figures to illustrate components (e.g., Figure 1, Figure 3, Figure 8 for Di T Block details) but does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code and models are available at https://rectified-pointflow.github.io/.
Open Datasets Yes For the multi-part shape assembly task, we experiment on the Breaking Bad [12], Two By Two [16], Part Net [15], and IKEA-Manual [17] datasets. The Part Net dataset has been processed for the shape assembly task following the same procedure as [17] but includes all object categories; we refer to this version as Part Net-Assembly. Evaluation of the pairwise registration is performed on the TUD-L [18] and Model Net-40 [19] datasets. We follow [22] for prepossessing the TUD-L dataset. We split all datasets into train/val/test sets following existing literature for fair comparisons. These datasets define parts at distinct levels, ranging from random partitions (e.g., Model Net-40 and Breaking Bad) to human-labeled (e.g., semantically meaningful parts in Part Net and IKEA-Manual). The statistics and information of all datasets are summarized in Tab. 1.
Dataset Splits Yes We split all datasets into train/val/test sets following existing literature for fair comparisons. These datasets define parts at distinct levels, ranging from random partitions (e.g., Model Net-40 and Breaking Bad) to human-labeled (e.g., semantically meaningful parts in Part Net and IKEA-Manual). The statistics and information of all datasets are summarized in Tab. 1.
Hardware Specification Yes We train our flow model on 8 NVIDIA A100 80GB GPUs for 400k iterations with an effective batch size of 256. ... In Tab. 7, we vary the sampling steps and report the Part Accuracy, Chamfer Distance, and the runtime per sample in Part Net-Assembly, measured on a single RTX 4090 GPU.
Software Dependencies No We use Point Transformer V3 (PTv3) [54] as the backbone for point cloud encoder, and use Diffusion Transformer (Di T) [55] as our flow model. Each Di T layer applies two self-attention stages: (i) part-wise attention to consolidate part-awareness, and (ii) global attention over all part tokens to fuse information. We stabilize the attention computation by applying RMS Normalization [56, 57] to the query and key vectors per head before attention operations. We sample the time steps from a U-shaped distribution following [58]. We pre-train the PTv3 encoder on all datasets with an additional subset of Objaverse [14] meshes, where we apply Part Field [13] to obtain annotations. After pretraining, we freeze the weights of the encoder. We train our flow model on 8 NVIDIA A100 80GB GPUs for 400k iterations with an effective batch size of 256. We use the Adam W [59] optimizer with an initial learning rate 5 10 4 which is halved every 25k iterations after the first 275k iterations.
Experiment Setup Yes We use Point Transformer V3 (PTv3) [54] as the backbone for point cloud encoder, and use Diffusion Transformer (Di T) [55] as our flow model. Each Di T layer applies two self-attention stages: (i) part-wise attention to consolidate part-awareness, and (ii) global attention over all part tokens to fuse information. We stabilize the attention computation by applying RMS Normalization [56, 57] to the query and key vectors per head before attention operations. We sample the time steps from a U-shaped distribution following [58]. We pre-train the PTv3 encoder on all datasets with an additional subset of Objaverse [14] meshes, where we apply Part Field [13] to obtain annotations. After pretraining, we freeze the weights of the encoder. We train our flow model on 8 NVIDIA A100 80GB GPUs for 400k iterations with an effective batch size of 256. We use the Adam W [59] optimizer with an initial learning rate 5 10 4 which is halved every 25k iterations after the first 275k iterations.