Demystifying Black-box Models with Symbolic Metamodels

Authors: Ahmed M. Alaa, Mihaela van der Schaar

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Building on the discussions in Section 4, we demonstrate the use cases of symbolic metamodeling through experiments on synthetic and real data. In all experiments, we used Sympy [19] (a symbolic computation library in Python) to carry out computations involving Meijer G-functions5.
Researcher Affiliation Academia Ahmed M. Alaa ECE Department UCLA ahmedmalaa@ucla.edu Mihaela van der Schaar UCLA, University of Cambridge, and Alan Turing Institute {mv472@cam.ac.uk,mihaela@ee.ucla.edu}
Pseudocode Yes Algorithm 1 Symbolic Metamodeling Input: Model f(x), hyperparameters (m, n, p, q, r) Output: Metamodel g(x) G Xi Unif([0, 1]d), i = {1, . . ., n}. Repeat until convergence: ......θk+1 := θk γ θ P i ℓ(G(Xi; θ), f(Xi)) θ=θk g(x) G(Xi; θk) If g(x) / G: ...... g(x) = G(x; θ), G(x; θ) G, θ θk < δ, or ...... g(x) = Chebyshev(g(x))
Open Source Code Yes The code is provided at https://bitbucket.org/mvdschaar/mlforhealthlabpub.
Open Datasets Yes Using data for 2,000 breast cancer patients extracted from the UK cancer registry (data description is in Appendix B), we fit an XGBoost model f(x) to predict the patients 5 year mortality risk based on 5 features: age, number of nodes, tumor size, tumor grade and Estrogen-receptor (ER) status.
Dataset Splits Yes Using 5-fold cross-validation, we compare the area under receiver operating characteristic (AUC-ROC) accuracy of the XGBoost model with that of the PREDICT risk calculator (https://breast.predict.nhs.uk/), which is the risk equation most commonly used in current practice [41].
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory, or computational resources) used for running experiments were mentioned in the paper.
Software Dependencies No In all experiments, we used Sympy [19] (a symbolic computation library in Python) to carry out computations involving Meijer G-functions5. (No version numbers provided for Sympy or gplearn). We also used the gplearn library [40].
Experiment Setup No The paper mentions fitting a '2-layer neural network f(x) (with 200 hidden units)' and an 'XGBoost model f(x)' but does not provide specific hyperparameters or detailed training configurations (e.g., learning rate, batch size, optimizer settings).