Imbalance-Aware Uplift Modeling for Observational Data
Authors: Xuanying Chen, Zhining Liu, Li Yu, Liuyi Yao, Wenpeng Zhang, Yi Dong, Lihong Gu, Xiaodong Zeng, Yize Tan, Jinjie Gu6313-6321
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on a synthetic dataset and two real-world datasets, and the experimental results well demonstrate the superiority of our method over state-of-the-art. |
| Researcher Affiliation | Industry | 1Ant Group, 2Alibaba Group {xuanying.cxy,eason.lzn,jinli.yl}@antgroup.com, yly287738@alibaba-inc.com, zhangwenpeng0@gmail.com, {dongyi.dy,lihong.glh,xiaodong.zxd,yize.tyz,jinjie.gujj}@antgroup.com |
| Pseudocode | Yes | Algorithm 1: IAUM Method Input: Training data: D = {(Xi, Wi, Y obs i )}N i=1 Output: Fitted uplift estimator p 1: Fit g to the potential outcome µ0 i of the control group using the data {(Xi, Y 0 i )}N0 i=1. 2: Fit h to the potential outcome µ1 i of the treatment group using the data {(Xi, Y 1 i )}N1 i=1. 3: Fit f to estimate propensity score ˆe(Xi) using the data {(Xi, Wi)}N i=1. 4: With estimated g, h and f, construct the proxy outcome Y IAUM i using Equation (13). 5: Fit p to the data {(Xi, Y IAUM i )}N i=1. 6: return p |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the described methodology. |
| Open Datasets | Yes | RHC Dataset. We chose Right Heart Catheterization (RHC) data (Saito, Sakata, and Nakata 2019) as the real-world data set to compare our procedure with existing methods. |
| Dataset Splits | No | No explicit training/test/validation dataset splits including validation data were provided. The paper only mentions training and test splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions machine learning models and an optimizer (Adam) but does not provide specific software dependencies with version numbers for reproducibility. |
| Experiment Setup | Yes | On the synthetic dataset and the RHC dataset, we use the linear regression as the base learners for simplicity. For each scenario, we repeat the training process ten times and report the average bias and variance of the deviation between the expected true value and the predictions of the model output. As the industrial dataset has high-dimensional features, we choose the multilayer perceptron (MLP) with three hidden layers (the number of neurons is 512, 128 and 128, respectively) as the base learner to fit the data. All neural network-based methods are optimized by Adam (Kingma and Ba 2014) optimizer with a learning rate of 3e 4, and set the batch size to 512. |