Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Causal Customer Churn Analysis with Low-rank Tensor Block Hazard Model
Authors: Chenyin Gao, Zhiming Zhang, Shu Yang
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The efficacy and superiority of our model are further validated through comprehensive experiments on both simulated and real-world applications. |
| Researcher Affiliation | Academia | 1Department of Statistics, North Carolina State University, Raleigh, U.S.A. 2Independent scholar, Iowa State University, Ames, U.S.A. |
| Pseudocode | Yes | Algorithm 1 Projected gradient descent and spectral clustering for minimizing (4) |
| Open Source Code | Yes | Our implementation codes will be made publicly available after the acceptance of this manuscript. Our Python codes with illustrative examples are available at https://github.com/Gaochenyin/Low-Rank-Tensor-Block-Hazard-Model |
| Open Datasets | Yes | In this section, we apply our proposed method to one bank customer churn data from a Kaggle competition (https://www.kaggle.com/datasets/radheshyamkollipara/bankcustomer-churn). We provide an additional real-data application involving customer churn analysis of online retail. The data (https://www.kaggle.com/datasets/ankitverma2010/ecommerce-customer-churn-analysis-and-prediction) are collected by an E-commerce company. |
| Dataset Splits | Yes | For the model training, we divide the dataset into 80% training data and 20% test data, implementing 5-fold cross-validation. |
| Hardware Specification | Yes | All experiments are conducted on a computer with an Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz and 32GB RAM. |
| Software Dependencies | No | The paper mentions using packages like “sklearn” and “sksurv” in Section 6, but it does not specify their version numbers or the version of Python used, which is necessary for reproducibility. |
| Experiment Setup | Yes | For starter, the baseline covariates X RN d are generated by Xi i.i.d. N(0, Id) with d = 3. We generate the true parameter tensor Θ with each entry θi,t,l defined by: θi,t,l = (X i ηN) (tηT /T) cum{(l)2}ηL, where ηN = (1, 1, 1), ηT = 1, ηL = 1, and cum{(l)2} indicates the number of active treatments in (l)2. We consider the setting where T = 5, 10, N = 100, 300, 500, 1000, 2000, and k = 2, 3, 4. |