Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels
Authors: Zifu Wang, Xuefei Ning, Matthew Blaschko
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show consistent improvements over the cross-entropy loss across 4 semantic segmentation datasets (Cityscapes, PASCAL VOC, ADE20K, Deep Globe Land) and 13 architectures, including classic CNNs and recent vision transformers. |
| Researcher Affiliation | Academia | Zifu Wang1 Xuefei Ning2 Matthew B. Blaschko1 1 ESAT-PSI, KU Leuven, Leuven, Belgium 2 Department of Electronic Engineering, Tsinghua University, Beijing, China |
| Pseudocode | Yes | Figure 6: Code to compute active classes. |
| Open Source Code | Yes | The code is available at https://github.com/zifuwanggg/JDTLosses. |
| Open Datasets | Yes | We evaluate models on Cityscapes [11], PASCAL VOC [18], ADE20K [81] and Deep Globe Land [12]. |
| Dataset Splits | Yes | For Cityscapes, PASCAL VOC, and ADE20K, we repeat the experiments 3 times (except for SSL experiments that are single runs) and report performance on the validation set. For Deep Globe Land, we conduct 5-fold cross-validation. |
| Hardware Specification | Yes | Inference latency measurements are conducted with the same input size on a NVIDIA A100. We estimate the training memory requirements using a ground-truth size of 8 19 512 1024 (batch_size, num_classes, H, W), also on a NVIDIA A100. |
| Software Dependencies | No | The paper mentions software like Pytorch Image Models (timm) and Adam W but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | By default, we adopted the training details outlined in [70, 23, 24], except for the reduction of the batch size to 8. In particular, we utilize SGD with a weight decay of 0.0005 and a momentum of 0.9. The initial learning rate is 0.01, and is decayed according to (1 iter total iters)0.9. The number of iterations is 40K for Cityscapes [11] and PASCAL VOC [18], 10K for Deep Globe Land [12]. The crop size is 512 1024 for Cityscapes [11], 512 512 for PASCAL VOC [18] and Deep Globe Land [12]. |