Learning from Uncertain Data: From Possible Worlds to Possible Models
Authors: Jiongli Zhu, Su Feng, Boris Glavic, Babak Salimi
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement ZORRO using Sym Py [45], a Python library for symbolic computations and evaluate the system on two key applications: (1) computing prediction ranges and robustness certification for linear models trained on uncertain data, and (2) robustness of model weights for causal inference using linear models as a case study. We also measured the performance of ZORRO under varying conditions, including varying the degree of training data uncertainty. All our experiments are performed on a single machine with an Apple M1 chip, 8 cores, and 16 GB RAM. Experiments are repeated 5 times with different random seeds, and we report the mean (error bars denote 3σ). |
| Researcher Affiliation | Academia | Jiongli Zhu1 Su Feng2 Boris Glavic3 Babak Salimi1 1University of California, San Diego 2Nanjing Tech University 3University of Illinois, Chicago |
| Pseudocode | Yes | Algorithm 1: Abstract Learning |
| Open Source Code | Yes | The code is shared at https://github.com/lodino/Zorro. |
| Open Datasets | Yes | For robustness verification we use regression tasks: for MPG [58] (392 instances) we predict fuel consumption based on car features (cylinders, horsepower, weight); for Insurance [30] (1338 instances) we predict medical insurance charges based on demographics (age, gender, BMI), habits (smoking), and geographical features. |
| Dataset Splits | No | We use a 80:20 train-test split and inject random errors to the training data varying (i) the Uncertain Data Percentage, the percentage of instances that have uncertain features / labels, and (ii) the Uncertainty Radius, the difference between the minimum and maximum possible value of an uncertain feature expressed as a fraction of the feature s domain. |
| Hardware Specification | Yes | All our experiments are performed on a single machine with an Apple M1 chip, 8 cores, and 16 GB RAM. |
| Software Dependencies | No | We implement ZORRO using Sym Py [45], a Python library for symbolic computations |
| Experiment Setup | Yes | The robustness threshold is set to 5% of the label range for the MPG data, and 0.8% of the label range for the Insurance data. |