Automated Label Unification for Multi-Dataset Semantic Segmentation with GNNs

Authors: Ma Rong, Jie Chen, Xiangyang Xue, Jian Pu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The results demonstrate that our method significantly outperforms other multi-dataset training methods when trained on seven datasets simultaneously, and achieves state-of-the-art performance on the Wild Dash 2 benchmark.
Researcher Affiliation Academia Rong Ma , Jie Chen , Xiangyang Xue, and Jian Pu Fudan University rma22@m.fudan.edu.cn, {chenj19,xyxue,jianpu}@fudan.edu.cn
Pseudocode Yes Algorithm 1 The training pipeline of our model
Open Source Code Yes Our code can be found in https://github.com/Mrhonor/Auto Uni Seg.
Open Datasets Yes Our training datasets cover a wide range of scenarios, from indoor scenes to driving scenes. We also introduce corresponding test datasets, which are not used in the training process, for the respective scenes to evaluate our generalization capability. Datasets mentioned: City Scapes [13], Mapillary [33], BDD [46], IDD [42], SUN RGBD [37], ADE20K [51], COCO [25].
Dataset Splits No The paper lists 'Training and Validation datasets' but does not specify the explicit split percentages or sample counts used for validation.
Hardware Specification Yes We train our model for 300k iterations on four 80G A100 GPUs.
Software Dependencies No The paper mentions using llama-2-7B model and Adam W optimizer, but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes Our segmentation model is based on the HRNet-W48 architecture, while the GNN model is a three-layer Graph SAGE. We utilize the llama-2-7B model to encode label descriptions into 4096-dimensional text features. We evenly sample 3 images per dataset within a batch for each GPU. For all images, We first apply random resizing with a ratio ranging from 0.5 to 2, followed by a random crop operation to achieve a final image size of 768 768 pixels. We use Adam W optimizer with warmup and polynomial learning rate decay, starting with a learning rate of 0.0001. We train our model for 300k iterations.