PM-INR: Prior-Rich Multi-Modal Implicit Large-Scale Scene Neural Representation
Authors: Yiying Yang, Fukun Yin, Wen Liu, Jiayuan Fan, Xin Chen, Gang Yu, Tao Chen
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments have demonstrated the effectiveness and robustness of our method for outdoor unbounded large-scale scene novel view synthesis, which outperforms state-of-the-art methods in terms of PSNR, SSIM, and LPIPS. |
| Researcher Affiliation | Collaboration | 1 Academy for Engineering and Technology, Fudan University 2 School of Information Science and Technology, Fudan University 3 Tencent PCG |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | Our code and models will be available. |
| Open Datasets | Yes | We evaluate our PM-INR on two datasets, namely the OMMO (Lu et al. 2023) and Blended MVS (Yao et al. 2020) dataset. |
| Dataset Splits | No | No specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and testing was found in the paper. |
| Hardware Specification | Yes | We train the VQ-VAE network for 20k iterations with a batch size of 16 accumulated over 21 batches, which needs about 1 day on two A100 GPUs. Each scene is trained on four Nvidia A100 GPU devices for around one day. |
| Software Dependencies | No | The paper states, 'Our method is built with Pytorch framework,' but does not provide specific version numbers for Pytorch or any other software dependencies. |
| Experiment Setup | Yes | We train the VQ-VAE network for 20k iterations with a batch size of 16 accumulated over 21 batches... The dimensions of image codebook Bg, text codebook Bt, and 3D codebook Bd are 256, 512, and 16, respectively... the number of learnable query embeddings is 128, and each embedding has a dimension qi R64... We adopt the optimizing strategies of Mip-Ne RF 360, 250k iterations of optimization with a batch size of 211, using Adam (Kingma and Ba 2014) optimizer with a learning rate that is annealed log-linearly from 2 10 3 to 2 10 5 with a warm-up phase of 512 iterations, and gradient clipping to a norm of 10 3. |