Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
General Table Completion using a Bayesian Nonparametric Model
Authors: Isabel Valera, Zoubin Ghahramani
NeurIPS 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, our experiments over five real databases show that the proposed approach provides more robust and accurate estimates than the standard IBP and the Bayesian probabilistic matrix factorization with Gaussian observations. |
| Researcher Affiliation | Academia | Isabel Valera Department of Signal Processing and Communications University Carlos III in Madrid EMAIL Zoubin Ghahramani Department of Engineering University of Cambridge EMAIL |
| Pseudocode | Yes | Algorithm 1 Inference Algorithm. |
| Open Source Code | Yes | An efficient C-code implementation for Matlab of the proposed table completion tool is also released on the authors website. |
| Open Datasets | Yes | Statlog German credit dataset [5]... Dataset available on: http://archive.ics.uci.edu/ml/datasets.html |
| Dataset Splits | No | The paper discusses average test log-likelihood per missing datum but does not provide specific details on train/validation/test dataset splits, such as percentages, sample counts, or cross-validation setup. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | An efficient C-code implementation for Matlab of the proposed table completion tool is also released on the authors website. |
| Experiment Setup | Yes | For the GIBP, we consider for the real positive and the count data the following transformation, that maps from the real numbers to the real positive numbers, f(x) = log(exp(wx) + 1), where w is a user hyper-parameter.For the BPMF model, we have used different numbers of latent features (in particular, 10, 20 and 50), although we only show the best results for each database, specifically, K = 10 for the NESARC and the wine databases, and K = 50 for the remainder. |