Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels

Authors: Zifu Wang, Xuefei Ning, Matthew Blaschko

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show consistent improvements over the cross-entropy loss across 4 semantic segmentation datasets (Cityscapes, PASCAL VOC, ADE20K, Deep Globe Land) and 13 architectures, including classic CNNs and recent vision transformers.
Researcher Affiliation Academia Zifu Wang1 Xuefei Ning2 Matthew B. Blaschko1 1 ESAT-PSI, KU Leuven, Leuven, Belgium 2 Department of Electronic Engineering, Tsinghua University, Beijing, China
Pseudocode Yes Figure 6: Code to compute active classes.
Open Source Code Yes The code is available at https://github.com/zifuwanggg/JDTLosses.
Open Datasets Yes We evaluate models on Cityscapes [11], PASCAL VOC [18], ADE20K [81] and Deep Globe Land [12].
Dataset Splits Yes For Cityscapes, PASCAL VOC, and ADE20K, we repeat the experiments 3 times (except for SSL experiments that are single runs) and report performance on the validation set. For Deep Globe Land, we conduct 5-fold cross-validation.
Hardware Specification Yes Inference latency measurements are conducted with the same input size on a NVIDIA A100. We estimate the training memory requirements using a ground-truth size of 8 19 512 1024 (batch_size, num_classes, H, W), also on a NVIDIA A100.
Software Dependencies No The paper mentions software like Pytorch Image Models (timm) and Adam W but does not provide specific version numbers for these dependencies.
Experiment Setup Yes By default, we adopted the training details outlined in [70, 23, 24], except for the reduction of the batch size to 8. In particular, we utilize SGD with a weight decay of 0.0005 and a momentum of 0.9. The initial learning rate is 0.01, and is decayed according to (1 iter total iters)0.9. The number of iterations is 40K for Cityscapes [11] and PASCAL VOC [18], 10K for Deep Globe Land [12]. The crop size is 512 1024 for Cityscapes [11], 512 512 for PASCAL VOC [18] and Deep Globe Land [12].