GSN: Generalisable Segmentation in Neural Radiance Field
Authors: Vinayak Gupta, Rahul Goel, Sirikonda Dhawal, P. J. Narayanan
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our multi-view segmentation results are on par with methods that use traditional RFs. GSN closes the gap between standard and generalisable RF methods significantly. We calculate the mean Io U, accuracy and mean average precision score on four scenes from the LLFF (Mildenhall et al. 2019) dataset. |
| Researcher Affiliation | Academia | 1Indian Institute of Technology, Madras 2International Institute of Information Technology, Hyderabad |
| Pseudocode | No | The paper describes its methods through text and diagrams, but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | No | Project Page: https://vinayak-vg.github.io/GSN/. This is a project page, not an explicit statement of code release or a direct repository link. |
| Open Datasets | Yes | The dataset provided by IBRNet (Wang et al. 2021b) and some scenes from the LLFF Dataset (Mildenhall et al. 2019) are used for training our generalised model. |
| Dataset Splits | No | The paper uses the LLFF dataset for testing, implying a split, but does not provide specific percentages, sample counts, or explicit methodology for training/validation/test splits. |
| Hardware Specification | Yes | We use 2 RTX 3090 GPUs for distributed training and conducting our experiments. |
| Software Dependencies | No | The paper mentions using code from another paper and a model architecture (DINO Vi T-b8) but does not provide specific version numbers for software dependencies like programming languages or libraries. |
| Experiment Setup | Yes | For Stage I training, we use four blocks of viewray transformers and one block of view-ray transformer in Stage II training. We used 512 rays, 192 points and 200,000 iterations for Stage-I and trained it for two days. Stage II is trained only for 5,000 iterations with 512 rays and 192 points and trained only for 4 hours. We use a learning rate of 1e-3 for the Stage II training and decrease the learning rate of Stage-I by a factor of 10 during the Stage II training. The weight factor of the RGB loss LRGB is set to 0.1, and feature loss LF eat is set to 1 during Stage II training. We select 10 source views for every target view, and the training is done at an image resolution of 756x1008. |