Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Flux4D: Flow-based Unsupervised 4D Reconstruction

Authors: Jingkang Wang, Henry Che, Yun Chen, Ze Yang, Lily Goli, Sivabalan Manivasagam, Raquel Urtasun

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on outdoor driving datasets show Flux4D significantly outperforms existing methods in scalability, generalization, and reconstruction quality. We perform experiments on multiple outdoor dynamic datasets and assess novel view appearance and depth, as well as recovered flow. We also ablate Flux4D s design and show that Flux4D scales with more data.
Researcher Affiliation	Collaboration	Waabi1 University of Toronto2 UIUC3 https://waabi.ai/flux4d
Pseudocode	No	The paper does not contain a clearly labeled pseudocode or algorithm block in the provided text. While the NeurIPS checklist mentions pseudocode, it is not present in the main body of the paper.
Open Source Code	No	We are unable to release code at the time of submission. We recognize the importance of reproducibility and are actively exploring the possibility of releasing the code with the camera-ready version.
Open Datasets	Yes	Experiments on outdoor driving datasets Panda Set [48] and WOD [36] demonstrate that Flux4D achieves better scene decomposition and novel view synthesis than previous state-of-the-art annotation-free reconstruction methods, and is competitive with per-scene optimization methods that use human annotations. We further compare Flux4D with So TA generalizable methods on WOD in Table 3, where we follow the setup in [25]. We showcase applying Flux4D for high-fidelity camera simulation in large-scale driving scenarios. Flux4D produces high-quality motion flows in diverse, large-scale dynamic scenes on Panda Set (Fig. 6), Argoverse 2 [45], and WOD (Fig. 7).
Dataset Splits	Yes	From Panda Set s 103 dynamic scenes (1080p cameras, 64-beam Li DARs, 10Hz), we select 10 diverse scenes for validation and use the rest for training. We use the front camera and 360 Li DAR, both collected at 10 Hz. To compare against existing feed-forward generalizable reconstruction methods that can only take a small number of frames as input, we report scene reconstruction results on short 1.5s windows within the validation sequences. Each method takes as input frames 0, 2, 4, 6, 8, 10, and is evaluated on frames 1, 3, 5, 7, 9 (interpolation) and 11-15 (future prediction). We sample a new snippet every 20 frames, yielding four non-overlapping evaluation snippets per log. We also evaluate against per-scene optimization methods over the full duration of the validation sequence (8 seconds) in the interpolation setting (every other frame is held out). For WOD evaluation, we follow the NVS setting in Driving Recon [25], using the Waymo-NOTR subset with three front cameras, taking {t 2, t 1, t + 1} frames as input, and generating the interpolated frame at time t, where t is every tenth frame in each sequence.
Hardware Specification	Yes	Unless otherwise stated, all models are trained for 30,000 iterations on 4 NVIDIA L40S (48G) GPUs, taking approximately 2 days. Reconstruction speed is measured on a single RTX A5000 GPU (24GB).
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	We conduct experiments on outdoor driving scenes from Panda Set [48] and Waymo Open Dataset (WOD) [36]. From Panda Set s 103 dynamic scenes (1080p cameras, 64-beam Li DARs, 10Hz), we select 10 diverse scenes for validation and use the rest for training. We use the front camera and 360 Li DAR, both collected at 10 Hz. We adopt a 3D U-Net with sparse convolutions [37] for fθ. Unless otherwise stated, all models are trained for 30,000 iterations on 4 NVIDIA L40S (48G) GPUs, taking approximately 2 days. The reconstruction loss weights λrgb, λSSIM, λdepth are set as 0.8, 0.2 and 0.01 respectively. The velocity regularization weight λvel is set as 5e-3.