Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels

Authors: Zifu Wang, Xuefei Ning, Matthew Blaschko

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show consistent improvements over the cross-entropy loss across 4 semantic segmentation datasets (Cityscapes, PASCAL VOC, ADE20K, Deep Globe Land) and 13 architectures, including classic CNNs and recent vision transformers.
Researcher Affiliation	Academia	Zifu Wang1 Xuefei Ning2 Matthew B. Blaschko1 1 ESAT-PSI, KU Leuven, Leuven, Belgium 2 Department of Electronic Engineering, Tsinghua University, Beijing, China
Pseudocode	Yes	Figure 6: Code to compute active classes.
Open Source Code	Yes	The code is available at https://github.com/zifuwanggg/JDTLosses.
Open Datasets	Yes	We evaluate models on Cityscapes [11], PASCAL VOC [18], ADE20K [81] and Deep Globe Land [12].
Dataset Splits	Yes	For Cityscapes, PASCAL VOC, and ADE20K, we repeat the experiments 3 times (except for SSL experiments that are single runs) and report performance on the validation set. For Deep Globe Land, we conduct 5-fold cross-validation.
Hardware Specification	Yes	Inference latency measurements are conducted with the same input size on a NVIDIA A100. We estimate the training memory requirements using a ground-truth size of 8 19 512 1024 (batch_size, num_classes, H, W), also on a NVIDIA A100.
Software Dependencies	No	The paper mentions software like Pytorch Image Models (timm) and Adam W but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	By default, we adopted the training details outlined in [70, 23, 24], except for the reduction of the batch size to 8. In particular, we utilize SGD with a weight decay of 0.0005 and a momentum of 0.9. The initial learning rate is 0.01, and is decayed according to (1 iter total iters)0.9. The number of iterations is 40K for Cityscapes [11] and PASCAL VOC [18], 10K for Deep Globe Land [12]. The crop size is 512 1024 for Cityscapes [11], 512 512 for PASCAL VOC [18] and Deep Globe Land [12].