Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Meimei: An Efficient Probabilistic Approach for Semantically Annotating Tables
Authors: Kunihiro Takeoka, Masafumi Oyamada, Shinji Nakadai, Takeshi Okadome281-288
AAAI 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrated the superiority of the proposed approach over state-of-the-art approaches for semantic annotation of real data (183 human-annotated tables obtained from the UCI Machine Learning Repository). |
| Researcher Affiliation | Collaboration | Kunihiro Takeoka NEC Corporation EMAIL Masafumi Oyamada NEC Corporation EMAIL Shinji Nakadai NEC Corporation EMAIL Takeshi Okadome Kwansei Gakuin University EMAIL |
| Pseudocode | Yes | Algorithm 1 Approximate prediction with Gibbs sampling |
| Open Source Code | No | The paper does not provide any explicit statement about releasing open-source code or a link to a code repository. |
| Open Datasets | Yes | The dataset we used consists of 183 human-annotated tables (with 781 NE-columns and 4,109 literal-columns) obtained from the UCI Machine Learning repository (Dua and Karra Taniskidou 2017). |
| Dataset Splits | No | The paper mentions using a 'training dataset' for optimizing parameters and for evaluation, but it does not specify any particular data splits (e.g., 80/10/10) for training, validation, and testing. It refers to 'human-annotated 183 tables'. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., CPU models, GPU types, memory). |
| Software Dependencies | No | The paper mentions software components like 'Poincarรฉ embedding' and 'random forest classifiers' but does not specify their version numbers or other software dependencies with versions required for replication. |
| Experiment Setup | Yes | We set the number of iterations in Gibbs sampling to 300 because we observed the convergence at that point and further iterations did not affect the accuracy of the model. |