Learning to Estimate Single-View Volumetric Flow Motions without 3D Supervision

Authors: Aleksandra Franz, Barbara Solenthaler, Nils Thuerey

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first present two ablations to illustrate the importance of handling depth ambiguity, and of the prototype density volumes. We then evaluate the method in comparison to a series of learned and optimization-based methods for a synthetic and a real-world dataset.
Researcher Affiliation Academia Aleksandra Franz Technical University of Munich (TUM) franzer@in.tum.de Barbara Solenthaler ETH Zurich TUM Institute for Advanced Study solenthaler@inf.ethz.ch Nils Thuerey Technical University of Munich (TUM) nils.thuerey@tum.de
Pseudocode No The paper describes the model architecture and equations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes Our source code is publicly available at https://github.com/tum-pbs/ Neural-Global-Transport and includes the data and configurations necessary to reproduce all results of the paper.
Open Datasets Yes We evaluate our method on both synthetic smoke flows and the real-world captures from the Scalar Flow dataset (Eckert et al., 2019).
Dataset Splits No The paper describes the data used for training and evaluation (e.g., 'remaining 105 frames' for synthetic, '130 frames per scene' for real world), but does not explicitly provide specific train/validation/test dataset splits with percentages, sample counts, or citations to predefined splits.
Hardware Specification Yes Our method is implemented in Tensor Flow version 1.12 under python version 3.6 and trained on a Nvidia Ge Force GTX 1080 Ti 11GB.
Software Dependencies Yes Our method is implemented in Tensor Flow version 1.12 under python version 3.6 and trained on a Nvidia Ge Force GTX 1080 Ti 11GB.
Experiment Setup Yes Density training Gρ is trained with Lρ = L ˆI + 2e-4LD + 1e-3Lz and a learning rate of 2e-4 with a decay of 2e-4, the decay is offset by -5000 iterations. We start at a grid resolution of 8x12x8 with 2 UNet levels. The resolution grows after 8k, 16k and 24k iterations by a factor of 2, adding a level of the UNet every time, thus reaching a maximum grid resolution of 64x96x64 with 5 levels. New levels are faded in over 3k iterations, starting 2k iterations after growth, by linearly interpolating between the up-sampled previous level and the current level. After fade-in only the output of the highest active level remains. The image resolution grows in conjunction with the density.