Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Deep Probability Estimation
Authors: Sheng Liu, Aakash Kaku, Weicheng Zhu, Matan Leibovich, Sreyas Mohan, Boyang Yu, Haoxiang Huang, Laure Zanna, Narges Razavian, Jonathan Niles-Weed, Carlos Fernandez-Granda
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate existing methods on the synthetic data as well as on three real-world probability estimation tasks, all of which involve inherent uncertainty: precipitation forecasting from radar images, predicting cancer patient survival from histopathology images, and predicting car crashes from dashcam videos. |
| Researcher Affiliation | Academia | 1Center for Data Science, New York University, New York, USA 2Courant Institute of Mathematical Sciences, New York University, New York, USA 3Department of Population Health & Department of Radiology, NYU School of Medicine, New York, USA. |
| Pseudocode | Yes | Algorithm 1 Pseudocode for Ca PE |
| Open Source Code | Yes | Code available at https://jackzhu727.github.io/deep-probability-estimation/. |
| Open Datasets | Yes | To benchmark probability-estimation methods, we build a synthetic dataset based on UTKFace (Zhang et al., 2017b), containing face images and associated ages. We use the German Weather service dataset4, which contains quality-controlled rainfall-depth composites from 17 operational Doppler radars. Following (Kim et al., 2019), we use 0.3 seconds of real dashcam videos from You Tube Crash dataset as input |
| Dataset Splits | Yes | The synthetic data is split into training, validation, and test sets with 16641, 4738, and 2329 samples, respectively. |
| Hardware Specification | No | The paper mentions training 'deep neural networks' but does not provide specific details on the hardware used, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper does not specify the versions of any software dependencies or libraries used for the experiments (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | No | The paper describes the 'ResNet-18 backbone architecture' and mentions 'stochastic gradient descent' but does not provide specific hyperparameter values like learning rate, batch size, or number of epochs in the main text. |