Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Interpretable Mixture of Experts

Authors: Aya Abdelsalam Ismail, Sercan O Arik, Jinsung Yoon, Ankur Taly, Soheil Feizi, Tomas Pfister

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on 15 tabular and time-series datasets, IME is demonstrated to be more accurate than single interpretable models and perform comparably with existing state-of-the-art DNNs in accuracy. On most datasets, IME even outperforms DNNs, while providing faithful explanations. Lastly, IME s explanations are compared to commonly-used post-hoc explanations methods through a user study participants are able to better predict the model behavior when given IME explanations, while ﬁnding IME s explanations more faithful and trustworthy.
Researcher Affiliation	Collaboration	Aya Abdelsalam Ismail 1, Sercan Ö. Arik2, Jinsung Yoon2, Ankur Taly2, Soheil Feizi3, and Tomas Pﬁster2 1Genentech 2Google Cloud AI 3University of Maryland
Pseudocode	Yes	Algorithm 1: Train Interpretable Mixture of Experts
Open Source Code	No	The paper does not contain an explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	We evaluate the performance of IME on numerous real-world tabular and time-series datasets. As tabular data, we examine both the ones with time component, where the assignment module takes past errors as input, and the ones without past errors. Detailed descriptions of datasets and hyperparameter tuning are available in the Appendix C. [...] On standard tabular tasks without time component, we compare the performance of S-IMEi (interpretable assignment and interpretable experts) with other inherently-interpretable models. For classiﬁcation, Telecom Churn Prediction (tel, 2018), Breast Cancer Prediction (Dua & Graﬀ, 2017) and Credit Fraud Detection (Dal Pozzolo, 2015) datasets; while for regression, FICO Score Prediction (ﬁc, 2018) datasets are used. [...] We conduct experiments on multiple real-world time-series datasets for forecasting tasks, including Electricity (Electricity), Climate (Climate) and ETT (Zhou et al., 2021).
Dataset Splits	Yes	We perform a 70/10/20 train/validation/test split for each dataset.
Hardware Specification	Yes	All experiments were ran on a NVIDIA Tesla V100 GPU.
Software Dependencies	No	The paper mentions 'Adam optimizer' and 'Tune' for hyperparameter tuning but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	Detailed descriptions of datasets and hyperparameter tuning are available in the Appendix C. [...] For all IME experiments we set k = 1 for Lutil. For Ldiv(f, X), we set τ = 0.2. All models were trained using Adam optimizer, and the batch size and learning rate values were modiﬁed at each experiment. IME Number of Learning rates Model Hyperparameters experts τ ρ β γ δ λ Linear Assign. LR Expert 20 .0001 .001 10 0 .1 1 Linear Assign. SDT Expert 20 .0001 .001 .1 0 .1 1 MLP Assign. LR Expert 20 .0001 .001 .1 0 .1 1 MLP Assign. SDT Expert 20 .0001 .001 .1 0 .1 1 Table 6: IME Hyperparameters for Rossmann dataset.