Semantic Flow: Learning Semantic Fields of Dynamic Scenes from Monocular Videos

Authors: Fengrui Tian, Yueqi Duan, Angtian Wang, Jianfei Guo, Shaoyi Du

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our model is able to learn from multiple dynamic scenes and supports a series of new tasks such as instance-level scene editing, semantic completions, dynamic scene tracking and semantic adaption on novel scenes. We evaluate our method by conducting experiments on various semantic tasks of dynamic scenes.
Researcher Affiliation Collaboration Fengrui Tian1 Yueqi Duan2 Angtian Wang3 Jianfei Guo4 Shaoyi Du1 1National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi an Jiaotong University 2Tsinghua University 3Johns Hopkins University 4Shanghai AI Laboratory
Pseudocode No The paper describes the model architecture and mathematical formulations but does not provide any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Corresponding author. Codes are available at https://github.com/tianfr/Semantic-Flow/. The implementation codes of our model and our Semantic Dynamic Scene dataset are available at https://github.com/tianfr/Semantic-Flow/.
Open Datasets Yes We introduce Semantic Dynamic Scene dataset which is built upon the Dynamic Scene dataset (Yoon et al., 2020). The implementation codes of our model and our Semantic Dynamic Scene dataset are available at https://github.com/tianfr/Semantic-Flow/.
Dataset Splits No The paper describes how data is used for different tasks and training scenarios (e.g., finetuning on specific scenes, training with certain percentages of labels), but it does not specify a general, distinct validation split with percentages or counts for the entire dataset.
Hardware Specification Yes The entire model is trained on a Nvidia RTX 3090 GPU with a total batch size of 1024 rays.
Software Dependencies No The paper mentions several models and optimizers (e.g., Adam optimizer, Slow Only, ResNet18, RAFT, FlowNet) and datasets (Kinetics-400, ImageNet), but it does not specify version numbers for any software libraries, frameworks, or programming languages used.
Experiment Setup Yes We train the semantic field of dynamic foreground for 40,000 iterations with Adam optimizer (Kingma & Ba, 2015). For the semantic field of static background, we pretrain the semantic field of static foreground for 100,000 iterations. The learning rate is set to 5 × 10−4. The learning rate is 0.0005. We used Adam optimizer (Kingma & Ba, 2015) where betas is (0.9, 0.999). As for flow attention module, the head number H is 4 and the number of channels C is 64. During training, there are 512 rays for each scene in a mini-batch.