Learning Robust Decision Policies from Observational Data

Authors: Muhammad Osama, Dave Zachariah, Peter Stoica

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The performance and statistical properties of the proposed method are illustrated using both real and synthetic data.
Researcher Affiliation Academia Muhammad Osama muhammad.osama@it.uu.se Dave Zachariah dave.zachariah@it.uu.se Peter Stoica peter.stoica@it.uu.se Division of System and Control Department of Information Technology Uppsala University, Sweden.
Pseudocode Yes Algorithm 1 Robust policy
Open Source Code Yes The code for the experiments can be found here.
Open Datasets Yes We use data from the Infant Health and Development program (IHDP) [3], which investigated the effect of personalized home visits and intensive high-quality child care on the health of low birth-weight and premature infants [8].
Dataset Splits No The IHDP data contains 747 data points and we randomly select a subset of n = 600 training points that form Dn. The remaining 147 points are used to evaluate learned policies.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper does not provide specific software dependency details, such as library names with version numbers (e.g., Python 3.8, TensorFlow 2.x, PyTorch 1.x).
Experiment Setup Yes We create a synthetic dataset, drawing n = 200 data points from the training distribution (1)... We let α = 20%. The IHDP data contains 747 data points and we randomly select a subset of n = 600 training points that form Dn. The remaining 147 points are used to evaluate learned policies. To learn the weights (8) for the robust policy, we first reduce the 25-dimensional covariates ez into 4dimensional features z = enc(ez) using an autoencoder [2, sec.7.1]. Then bp(z|x) is a learned Gaussian mixture model with four mixture components and bp(x) is a learned Bernoulli model. Together the models define (8) and a robust policy πα(z) is learned for the target probability α = 20%. The probability that the cost y exceeds yα(z) is 18.6%, estimated using 500 Monte Carlo runs...