Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Distance-Based Network Recovery under Feature Correlation

Authors: David Adametz, Volker Roth

NeurIPS 2014 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments We first look at synthetic data and compare how well the recovered network matches the true one. Hereby, the accuracy is measured by the f-score using the edges (positive/negative/zero). and 4.2 Real-World Data: A Network of Biological Pathways In order to demonstrate the scalability of Ti MT, we apply it to the publicly available colon cancer dataset of Sheffer et al. [20]
Researcher Affiliation Academia Department of Mathematics and Computer Science University of Basel, Switzerland
Pseudocode Yes Algorithm 1 One loop of the MCMC sampler
Open Source Code No The paper does not provide a link to the source code or an explicit statement about its release.
Open Datasets Yes we apply it to the publicly available colon cancer dataset of Sheffer et al. [20]
Dataset Splits No The paper does not explicitly specify training/validation/test dataset splits, specific percentages, or absolute sample counts for each split needed to reproduce the experiment.
Hardware Specification Yes Runtime on a standard 3 GHz PC was 3:10 hours for Ti MT
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes Hyperparameters α, β and d At some point in every Bayesian analysis, all hyperparameters need to be specified in a sensible manner. Currently, the occurrence of d in Eq. (9) is particularly problematic, since (i) the number of latent features is unknown and (ii) it critically affects the balance between determinants. To resolve this issue, recall that α must satisfy α > 1 2(d 1), which can alternatively be expressed as α = 1 2(vd n + 1) with v > 1 + n 2 d . Thereby, we arrive at ℓ(W ; v, β, D, 1n) = d 2 log |W| d 2 log(1 n W1n) vd 2 log |In β 4 WQD|, (10) where d now influences the likelihood on a global level and can be used as temperature reminiscent of simulated annealing techniques for optimization. In more detail, we initialize the MCMC sampler with a small value of d and increase it slowly, until the acceptance ratio is below, say, 1 percent. After that event, all samples of W are averaged to obtain the final network.