Neural Frailty Machine: Beyond proportional hazard assumption in neural survival regressions

Authors: Ruofan Wu, Jiawei Qiao, Mingzhe Wu, Wen Yu, Ming Zheng, Tengfei LIU, Tianyi Zhang, Weiqiang Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we provide synthetic experiments that verify our theoretical statements. We also conduct experimental evaluations over 6 benchmark datasets of different scales, showing that the proposed NFM models achieve predictive performance comparable to or sometimes surpassing state-of-the-art survival models.
Researcher Affiliation Collaboration Ant Group Fudan University Coupang {ruofan.wrf, aaron.ltf, zty113091, weiqiang.wwq}@antgroup.com jeremyqjw@163.com, wumingzhe.darcy@gmail.com, {wenyu, mingzheng}@fudan.edu.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is publicly availabel at https://github.com/Rorschach1989/nfm
Open Datasets Yes For METABRIC, Rot GBSG, FLCHAIN, SUPPORT and KKBOX dataset, we take the version provided in the pycox package [46].
Dataset Splits Yes Since all the survival datasets do not have standard train/test splits, we follow previous practice [75] that uses 5-fold cross-validation (CV): 1 fold is for testing, and 20% of the rest is held out for validation.
Hardware Specification No The paper mentions training models but does not specify any particular hardware (e.g., CPU, GPU models, or cloud resources) used for running the experiments.
Software Dependencies No We use pytorch to implement NFM. The source code is provided in the supplementary material. For the baseline models: We use the implementations of Cox PH, GBM, and RSF from the sksurv package [54], for the KKBOX dataset, we use the XGBoost library [10] to implement GBM and RSF, which might yield some performance degradation. We use the pycox package to implement Deep Surv, Cox Time, and Deep Hit models. We use the official code provided in the SODEN paper [64] to implement SODEN. We obtain results of Su Mo and Deep EH based on our re-implementations.
Experiment Setup Yes Hyperparameter configurations We specify below the network architectures and optimization configurations used in all the tasks: PF scheme: For both bm and bh, we use 64 hidden units for n = 1000, 128 hidden units for n = 5000 and 256 hidden units for n = 10000. We train each model for 100 epochs with batch size 128, optimized using Adam with learning rate 0.0001, and no weight decay. FN scheme: For both bν, we use 64 hidden units for n = 1000, 128 hidden units for n = 5000 and 256 hidden units for n = 10000. We train each model for 100 epochs with batch size 128, optimized using Adam with learning rate 0.0001, and no weight decay. ... Number of layers (network depth) We tune the network depth L {2, 3, 4}. ... Number of hidden units in each layer (network width) W {2k, 5 k 10}. ... Optional dropout We optionally apply dropout with probability p {0.1, 0.2, 0.3, 0.5, 0.7}. ... Batch size We tune batch size within the range {128, 256, 512}, in the KKBOX dataset, we also tested with larger batch sizes {1024}. ... Learning rate and weight decay We tune both the learning rate and weight decay coefficient of Adam within range {0.01, 0.001, 0.0001}.