TFS-NeRF: Template-Free NeRF for Semantic 3D Reconstruction of Dynamic Scene
Authors: Sandika Biswas, Qianyi Wu, Biplab Banerjee, Hamid Rezatofighi
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our approach produces high-quality reconstructions of both deformable and non-deformable objects in complex interactions, with improved training efficiency compared to existing methods. The efficacy of our proposed method is demonstrated through comprehensive experiments conducted on several datasets featuring human-object, hand-object interactions, and animal movements. In this section, we first demonstrate the effectiveness of our method for precise 3D reconstructions of deformable and non-deformable objects, as well as their interactions. Next, we present qualitative and quantitative comparisons of our approach with relevant state-of-the-art methods. Additionally, we conduct a comprehensive ablation study to analyze the impact of different network design choices and loss formulations on our model s performance. |
| Researcher Affiliation | Academia | Sandika Biswas1, 2, Qianyi Wu1, Biplab Banerjee2, and Hamid Rezatofighi1 1Faculty of IT, Monash University 2Indian Institute of Technology (IIT), Bombay |
| Pseudocode | No | The paper describes the methodology using text and diagrams, but does not provide pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code and models will be available on our github page. |
| Open Datasets | Yes | Datasets. To evaluate our reconstruction under multiple entity interactions, we select the BEHAVE [31] with human-object interactions and HO3D-V3 [44] with hand-object interactions. Following [28], all methods utilize dataset-provided camera poses, body poses, and masks for training. We also evaluate our method for reconstructing single deformable entities (only human/animal) and use a similar setup as proposed in TAVA [28]. For this purpose, we test performance on two datasets: the ZJU-Mo Cap dataset [9] for human reconstruction and a synthetic dataset for animal reconstruction from [28]. |
| Dataset Splits | No | The paper mentions training and testing but does not explicitly describe a validation set or how it was used for hyperparameter tuning or model selection. |
| Hardware Specification | Yes | All training and inference have been performed in NVIDIA RTX 4090 GPU. |
| Software Dependencies | No | The paper mentions using 'Py Torch library' and 'ADAM optimizer' for implementation, and 'YOLOv8 [48]' and 'SAM [49]' for predicted masks, but it does not specify version numbers for any of these software components. |
| Experiment Setup | Yes | Implementation Details: The overall network proposed in Fig 2 is trained end-to-end, with a learning rate of 5.0e 4. We use ADAM optimizer for the SGD optimization and Py Torch library for the implementation of our method. where λskel, λW , λINN, Lconsis, λshape are empirically set to 2,10,1,1,0.03 respectively. Invertible Neural Network: This network consists of 2 Coupling layers [29], each with scaling and translation prediction modules. Each of these modules consists of 3 linear layers with dimensions 331 512, 512 512, and 512. Skinning weight prediction network: The skinning weight prediction network consists of 3 linear layers with dimensions 3 256, 256 256, 256 24. SDF prediction network: The SDF prediction network consists of 8 linear layers each with a hidden size of 256. A skip connection is added at layer 4. The dimensions of each layer are as follows 114 256, 256 256, 256 256, 256 217, 256 256, 256 256, 256, 256 257. RGB rendering network: The RGB rendering network (Fig. 2) consists of 5 linear layers with dimensions of 270 256, 256, 256, 256 256, and 256 3. |