Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning from Uncertain Data: From Possible Worlds to Possible Models
Authors: Jiongli Zhu, Su Feng, Boris Glavic, Babak Salimi
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement ZORRO using Sym Py [45], a Python library for symbolic computations and evaluate the system on two key applications: (1) computing prediction ranges and robustness certification for linear models trained on uncertain data, and (2) robustness of model weights for causal inference using linear models as a case study. We also measured the performance of ZORRO under varying conditions, including varying the degree of training data uncertainty. All our experiments are performed on a single machine with an Apple M1 chip, 8 cores, and 16 GB RAM. Experiments are repeated 5 times with different random seeds, and we report the mean (error bars denote 3σ). |
| Researcher Affiliation | Academia | Jiongli Zhu1 Su Feng2 Boris Glavic3 Babak Salimi1 1University of California, San Diego 2Nanjing Tech University 3University of Illinois, Chicago |
| Pseudocode | Yes | Algorithm 1: Abstract Learning |
| Open Source Code | Yes | The code is shared at https://github.com/lodino/Zorro. |
| Open Datasets | Yes | For robustness verification we use regression tasks: for MPG [58] (392 instances) we predict fuel consumption based on car features (cylinders, horsepower, weight); for Insurance [30] (1338 instances) we predict medical insurance charges based on demographics (age, gender, BMI), habits (smoking), and geographical features. |
| Dataset Splits | No | We use a 80:20 train-test split and inject random errors to the training data varying (i) the Uncertain Data Percentage, the percentage of instances that have uncertain features / labels, and (ii) the Uncertainty Radius, the difference between the minimum and maximum possible value of an uncertain feature expressed as a fraction of the feature s domain. |
| Hardware Specification | Yes | All our experiments are performed on a single machine with an Apple M1 chip, 8 cores, and 16 GB RAM. |
| Software Dependencies | No | We implement ZORRO using Sym Py [45], a Python library for symbolic computations |
| Experiment Setup | Yes | The robustness threshold is set to 5% of the label range for the MPG data, and 0.8% of the label range for the Insurance data. |