Learning Robust Decision Policies from Observational Data
Authors: Muhammad Osama, Dave Zachariah, Peter Stoica
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The performance and statistical properties of the proposed method are illustrated using both real and synthetic data. |
| Researcher Affiliation | Academia | Muhammad Osama muhammad.osama@it.uu.se Dave Zachariah dave.zachariah@it.uu.se Peter Stoica peter.stoica@it.uu.se Division of System and Control Department of Information Technology Uppsala University, Sweden. |
| Pseudocode | Yes | Algorithm 1 Robust policy |
| Open Source Code | Yes | The code for the experiments can be found here. |
| Open Datasets | Yes | We use data from the Infant Health and Development program (IHDP) [3], which investigated the effect of personalized home visits and intensive high-quality child care on the health of low birth-weight and premature infants [8]. |
| Dataset Splits | No | The IHDP data contains 747 data points and we randomly select a subset of n = 600 training points that form Dn. The remaining 147 points are used to evaluate learned policies. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details, such as library names with version numbers (e.g., Python 3.8, TensorFlow 2.x, PyTorch 1.x). |
| Experiment Setup | Yes | We create a synthetic dataset, drawing n = 200 data points from the training distribution (1)... We let α = 20%. The IHDP data contains 747 data points and we randomly select a subset of n = 600 training points that form Dn. The remaining 147 points are used to evaluate learned policies. To learn the weights (8) for the robust policy, we first reduce the 25-dimensional covariates ez into 4dimensional features z = enc(ez) using an autoencoder [2, sec.7.1]. Then bp(z|x) is a learned Gaussian mixture model with four mixture components and bp(x) is a learned Bernoulli model. Together the models define (8) and a robust policy πα(z) is learned for the target probability α = 20%. The probability that the cost y exceeds yα(z) is 18.6%, estimated using 500 Monte Carlo runs... |