Demystifying Black-box Models with Symbolic Metamodels
Authors: Ahmed M. Alaa, Mihaela van der Schaar
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Building on the discussions in Section 4, we demonstrate the use cases of symbolic metamodeling through experiments on synthetic and real data. In all experiments, we used Sympy [19] (a symbolic computation library in Python) to carry out computations involving Meijer G-functions5. |
| Researcher Affiliation | Academia | Ahmed M. Alaa ECE Department UCLA ahmedmalaa@ucla.edu Mihaela van der Schaar UCLA, University of Cambridge, and Alan Turing Institute {mv472@cam.ac.uk,mihaela@ee.ucla.edu} |
| Pseudocode | Yes | Algorithm 1 Symbolic Metamodeling Input: Model f(x), hyperparameters (m, n, p, q, r) Output: Metamodel g(x) G Xi Unif([0, 1]d), i = {1, . . ., n}. Repeat until convergence: ......θk+1 := θk γ θ P i ℓ(G(Xi; θ), f(Xi)) θ=θk g(x) G(Xi; θk) If g(x) / G: ...... g(x) = G(x; θ), G(x; θ) G, θ θk < δ, or ...... g(x) = Chebyshev(g(x)) |
| Open Source Code | Yes | The code is provided at https://bitbucket.org/mvdschaar/mlforhealthlabpub. |
| Open Datasets | Yes | Using data for 2,000 breast cancer patients extracted from the UK cancer registry (data description is in Appendix B), we fit an XGBoost model f(x) to predict the patients 5 year mortality risk based on 5 features: age, number of nodes, tumor size, tumor grade and Estrogen-receptor (ER) status. |
| Dataset Splits | Yes | Using 5-fold cross-validation, we compare the area under receiver operating characteristic (AUC-ROC) accuracy of the XGBoost model with that of the PREDICT risk calculator (https://breast.predict.nhs.uk/), which is the risk equation most commonly used in current practice [41]. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory, or computational resources) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | In all experiments, we used Sympy [19] (a symbolic computation library in Python) to carry out computations involving Meijer G-functions5. (No version numbers provided for Sympy or gplearn). We also used the gplearn library [40]. |
| Experiment Setup | No | The paper mentions fitting a '2-layer neural network f(x) (with 200 hidden units)' and an 'XGBoost model f(x)' but does not provide specific hyperparameters or detailed training configurations (e.g., learning rate, batch size, optimizer settings). |