Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Uncertainty Estimation for Safety-critical Scene Segmentation via Fine-grained Reward Maximization
Authors: Hongzheng Yang, Cheng Chen, Yueyao CHEN, Scheppach, Hon Chi Yip, DOU QI
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The effectiveness of our method is demonstrated on two large safety-critical surgical scene segmentation datasets under two different uncertainty estimation settings. With real-time one forward pass at inference, our method outperforms state-of-the-art methods by a clear margin on all the calibration metrics of uncertainty estimation, while maintaining a high task accuracy for the segmentation results. |
| Researcher Affiliation | Academia | Hongzheng Yang1 , Cheng Chen2 , Yueyao Chen1, Markus Scheppach3, Hon Chi Yip1, Qi Dou1 1The Chinese University of Hong Kong 2Harvard Medical School & Massachusetts General Hospital 3 University Hospital of Augsburg |
| Pseudocode | Yes | Algorithm 1 FGRM algorithm |
| Open Source Code | Yes | Code is available at https://github.com/med-air/FGRM. |
| Open Datasets | Yes | Dataset-1: For LC segmentation dataset, we adopt the public dataset Cholec Seg8K [17], which contains 8,080 laparoscopic cholecystectomy image frames extracted from 17 video clips. Dataset-2: For ESD segmentation dataset, we collected a dataset with 1,203 image frames from 30 endoscopic surgical videos. ... we also provide experimental evaluations on Cityscape [10] dataset for urban scene segmentation in Appendix A.5. |
| Dataset Splits | Yes | For each dataset, we first randomly split 20% data for a held-out testing, and further split 80% of remaining data for training and 20% for validation. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | In our implementation, we employ an adapted Trans UNet as segmentation backbone. We replace the last softmax layer with a non-negative evidence layer. The evidence layer is implemented by the softplus function. For the base model pre-training, we use the Adam optimizer, with learning rate initialized to 1e-4. |
| Experiment Setup | Yes | For the base model pre-training, we use the Adam optimizer, with learning rate initialized to 1e-4. We totally trained 10 epoches on the training set, with batch size 4. For the maximization of uncertainty estimation reward, we tune the base model to maximize the reward on a held-out validation set. ... The learning rate and batch size was initialized as 1e-4 and 4, respectively. |