GSN: Generalisable Segmentation in Neural Radiance Field

Authors: Vinayak Gupta, Rahul Goel, Sirikonda Dhawal, P. J. Narayanan

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our multi-view segmentation results are on par with methods that use traditional RFs. GSN closes the gap between standard and generalisable RF methods significantly. We calculate the mean Io U, accuracy and mean average precision score on four scenes from the LLFF (Mildenhall et al. 2019) dataset.
Researcher Affiliation Academia 1Indian Institute of Technology, Madras 2International Institute of Information Technology, Hyderabad
Pseudocode No The paper describes its methods through text and diagrams, but does not provide structured pseudocode or algorithm blocks.
Open Source Code No Project Page: https://vinayak-vg.github.io/GSN/. This is a project page, not an explicit statement of code release or a direct repository link.
Open Datasets Yes The dataset provided by IBRNet (Wang et al. 2021b) and some scenes from the LLFF Dataset (Mildenhall et al. 2019) are used for training our generalised model.
Dataset Splits No The paper uses the LLFF dataset for testing, implying a split, but does not provide specific percentages, sample counts, or explicit methodology for training/validation/test splits.
Hardware Specification Yes We use 2 RTX 3090 GPUs for distributed training and conducting our experiments.
Software Dependencies No The paper mentions using code from another paper and a model architecture (DINO Vi T-b8) but does not provide specific version numbers for software dependencies like programming languages or libraries.
Experiment Setup Yes For Stage I training, we use four blocks of viewray transformers and one block of view-ray transformer in Stage II training. We used 512 rays, 192 points and 200,000 iterations for Stage-I and trained it for two days. Stage II is trained only for 5,000 iterations with 512 rays and 192 points and trained only for 4 hours. We use a learning rate of 1e-3 for the Stage II training and decrease the learning rate of Stage-I by a factor of 10 during the Stage II training. The weight factor of the RGB loss LRGB is set to 0.1, and feature loss LF eat is set to 1 during Stage II training. We select 10 source views for every target view, and the training is done at an image resolution of 756x1008.