VGOS: Voxel Grid Optimization for View Synthesis from Sparse Inputs
Authors: Jiakai Sun, Zhanjie Zhang, Jiafu Chen, Guangyuan Li, Boyan Ji, Lei Zhao, Wei Xing
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that VGOS achieves state-of-the-art performance for sparse inputs with super-fast convergence. |
| Researcher Affiliation | Academia | Zhejiang University {csjk, cszzj, chenjiafu, ji by, cszhl, wxing}@zju.edu.cn, lgy1428275037@163.com |
| Pseudocode | No | The paper does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Code will be available at https://github.com/SJo Jo K/VGOS. |
| Open Datasets | Yes | We perform experiments on inward-facing scenes from the Realistic Synthetic 360 dataset [Mildenhall et al., 2020] and forward-facing scenes from the LLFF dataset [Mildenhall et al., 2019]. |
| Dataset Splits | Yes | Following the protocol of Info Ne RF [Kim et al., 2022], we randomly sample 4 views out of 100 training images as sparse inputs and evaluate the model with 200 testing images. [...] We hold out 1/8 of the images as test sets following the standard protocol [Mildenhall et al., 2020] and report results for 3 input views randomly sampled from the remaining images. |
| Hardware Specification | Yes | For a fair comparison, the training time of each method is measured on our machine with a single NVIDIA RTX 3090 GPU using respective official implementations. |
| Software Dependencies | No | We implement our model on the top of DVGOv2 codebase using Pytorch [Paszke et al., 2019]. The paper mentions Pytorch but does not provide a specific version number for it or other software dependencies. |
| Experiment Setup | Yes | We use the Adam [Kingma and Ba, 2015] to optimize the voxel grids with the initial learning rate of 0.1 for all voxels and 10 3 for the shallow MLP and exponential learning rate decay is applied. For scenes in the Realistic Synthetic 360 dataset, we train the voxel grids for 5K iterations with a batch size of 213 rays for input views and 214 rays for sampled views in both stages. For scenes in the LLFF dataset, we train the voxel grids for 9K iterations with a batch size of 212 rays for input views and 214 rays for sampled views in only one stage. |