Geometric Exploitation for Indoor Panoramic Semantic Segmentation
Authors: Duc Cao Dinh, Seok Joon Kim, Kyusung Cho
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on both real-world (Stanford2D3DS, Matterport3D) and synthetic (Structured3D) datasets demonstrate the robustness of our framework, by setting new state-of-the-arts in almost evaluations |
| Researcher Affiliation | Industry | Duc Cao Dinh Seok Joon Kim Kyusung Cho Laboratory Department MAXST Seoul, Korea {caodinhduc, seokjoon, kscho}@maxst.com |
| Pseudocode | No | The paper describes the proposed network architecture and processes in detail using text and diagrams, but it does not include a formal pseudocode block or an algorithm listing. |
| Open Source Code | Yes | The code and updated results are available at: https://github.com/caodinhduc/vertical_relative_distance. |
| Open Datasets | Yes | We utilize three publicly available datasets for comparison: Stanford2D3DS [3], Structured3D [37], and Matterport3D [5]. |
| Dataset Splits | Yes | The Stanford2D3DS dataset... organized into three official folds. We follow the fold-splitting scheme established in previous works [4, 18, 34, 35]. The Structured3D dataset... we define training, validation, and test splits as follows: scenes 00000 02999 for training, scenes 03000 03249 for validation, and scenes 03250 03499 for testing. For all evaluations, we use raw rendered images under full lighting and furniture configurations. Meanwhile, the Matterport3D dataset... Following the processing and split protocol of Guttikonda and Rambach [12], we create training, validation, and test subsets for consistency in our experiments. |
| Hardware Specification | Yes | We train our model on a single NVIDIA Ge Force RTX 3090 GPU |
| Software Dependencies | No | The paper mentions the use of Adam W optimizer [16] and specific loss functions (Focal and Huber losses) but does not provide version numbers for any software libraries or dependencies (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | We train our model on a single NVIDIA Ge Force RTX 3090 GPU, starting with an initial learning rate of 5e-5, adjusted using a poly decay strategy with a power of 0.9 over the training epochs. For the Stanford2D3DS, Structured3D, and Matterport3D datasets, we train for 100, 50, and 100 epochs, respectively. The Adam W optimizer [16] is used with an epsilon of 1e-8, a weight decay of 1e-4, and a batch size of 4. Image augmentations include random horizontal flipping, random cropping, and resizing to 512 1024. In the testing phase, images are also processed at a resolution of 512 1024. Other settings and hyperparameters match those of Tran4PASS+ [35]. For the segmentation and depth estimation tasks, we use Focal and Huber losses, respectively, with the final training loss computed as a combination: Ltotal = α1.Lover sampled segment + α2.Lunder sampled segment + α3.Ldepth (7) Here, Lover sampled segment and Lunder sampled segment represent the segmentation losses for the over-sampled and under-sampled segments estimation, respectively. In our experiments, the weights α1, α2, and α3 are set to [1, 5, 1]. |