reproducibilityindex.ai

Imbalance-Aware Uplift Modeling for Observational Data

Authors: Xuanying Chen, Zhining Liu, Li Yu, Liuyi Yao, Wenpeng Zhang, Yi Dong, Lihong Gu, Xiaodong Zeng, Yize Tan, Jinjie Gu6313-6321

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on a synthetic dataset and two real-world datasets, and the experimental results well demonstrate the superiority of our method over state-of-the-art.
Researcher Affiliation	Industry	1Ant Group, 2Alibaba Group {xuanying.cxy,eason.lzn,jinli.yl}@antgroup.com, yly287738@alibaba-inc.com, zhangwenpeng0@gmail.com, {dongyi.dy,lihong.glh,xiaodong.zxd,yize.tyz,jinjie.gujj}@antgroup.com
Pseudocode	Yes	Algorithm 1: IAUM Method Input: Training data: D = {(Xi, Wi, Y obs i )}N i=1 Output: Fitted uplift estimator p 1: Fit g to the potential outcome µ0 i of the control group using the data {(Xi, Y 0 i )}N0 i=1. 2: Fit h to the potential outcome µ1 i of the treatment group using the data {(Xi, Y 1 i )}N1 i=1. 3: Fit f to estimate propensity score ˆe(Xi) using the data {(Xi, Wi)}N i=1. 4: With estimated g, h and f, construct the proxy outcome Y IAUM i using Equation (13). 5: Fit p to the data {(Xi, Y IAUM i )}N i=1. 6: return p
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets	Yes	RHC Dataset. We chose Right Heart Catheterization (RHC) data (Saito, Sakata, and Nakata 2019) as the real-world data set to compare our procedure with existing methods.
Dataset Splits	No	No explicit training/test/validation dataset splits including validation data were provided. The paper only mentions training and test splits.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions machine learning models and an optimizer (Adam) but does not provide specific software dependencies with version numbers for reproducibility.
Experiment Setup	Yes	On the synthetic dataset and the RHC dataset, we use the linear regression as the base learners for simplicity. For each scenario, we repeat the training process ten times and report the average bias and variance of the deviation between the expected true value and the predictions of the model output. As the industrial dataset has high-dimensional features, we choose the multilayer perceptron (MLP) with three hidden layers (the number of neurons is 512, 128 and 128, respectively) as the base learner to fit the data. All neural network-based methods are optimized by Adam (Kingma and Ba 2014) optimizer with a learning rate of 3e 4, and set the batch size to 512.