Masked Space-Time Hash Encoding for Efficient Dynamic Scene Reconstruction
Authors: Feng Wang, Zilong Chen, Guokang Wang, Yafei Song, Huaping Liu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For validating the performances of the proposed method, we conduct experiments on two public datasets and our collected dataset: (1) The Plenoptic Video Dataset [40]... (2) Google Immersive Dataset [9]... (3) To validate the robustness of our method on more complex in-the-wild scenarios, we collect six time-synchronized multi-view videos... The quantitative results and comparisons are presented in Tab. 1 and Tab. 2. For the Plenoptic Video dataset, our method surpasses previous state-of-the-art methods by a non-trivial margin, with a 0.7 to 1.4 PSNR gains, and > 30% improvements on the perceptual metric LPIPS. Fig. 1 and Fig. 2 show the quantitative and qualitative comparisons to other state-of-the-art methods. |
| Researcher Affiliation | Collaboration | Feng Wang 1 Zilong Chen 1 Guokang Wang1 Yafei Song2 Huaping Liu 1 1Beijing National Research Center for Information Science and Technology(BNRist), Department of Computer Science and Technology, Tsinghua University 2Alibaba Group |
| Pseudocode | No | The paper describes its methods using mathematical equations and textual explanations but does not contain any structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/masked-spacetime-hashing/msth. |
| Open Datasets | Yes | For validating the performances of the proposed method, we conduct experiments on two public datasets and our collected dataset: (1) The Plenoptic Video Dataset [40], which consists of 6 publicly accessible scenes: coffee-martini, flame-salmon, cook-spinach, cut-roasted-beef, flame-steak, and sear-steak. (2) Google Immersive Dataset [9]: The Google Immersive dataset contains light field videos... (3) To validate the robustness of our method on more complex in-the-wild scenarios, we collect six time-synchronized multi-view videos including more realistic observations such as pedestrians, moving cars, and grasses with people playing. We named the collected dataset as Campus Dataset. The Campus dataset is much more difficult than the above two curated ones in the movement complexities and dynamic areas. For detail on the dataset, please see our Appendix. (Also “To validate the effectiveness of our method on scenes in more realistic settings with large areas of dynamic regions and more complex movements, we collect a synchronized multi-view video dataset with 6 challenging dynamic scenes, which will be publicly available.”) |
| Dataset Splits | No | For the above three multi-view datasets, we follow the experiment setting in [40] that employs 18 views for training and 1 view for evaluation. To quantitatively evaluate the rendering quality on novel views, we measure PSNR, DSSIM, and LPIPS [101] on the test views. ... For evaluation, we follow the common setting from [68, 72, 10]. The paper mentions training and evaluation/test splits but does not explicitly define a separate validation split with specific sizes or percentages. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud computing specifications used for running experiments. It only implicitly mentions GPU usage by stating “we implement a CUDA extension based on Py Torch [65]”. |
| Software Dependencies | No | The paper mentions “Py Torch [65]” and “CUDA extension” but does not specify version numbers for these software components. |
| Experiment Setup | Yes | In our implementation, we use one layer of proposal net to sample 128 points. For the mask, we use a non-hash voxel grid with 128 spatial resolution. To encourage the separation of the static and the dynamic, we utilize a mask loss that aims at generating sparse dynamic voxels by constraining the mask to be close at 1. We also adopt distortion loss [4] with λdist = 2e 2. For uncertainty loss, we set γ = 3e 4 and λ = 3e 5. |