Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Meimei: An Efficient Probabilistic Approach for Semantically Annotating Tables
Authors: Kunihiro Takeoka, Masafumi Oyamada, Shinji Nakadai, Takeshi Okadome281-288
AAAI 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrated the superiority of the proposed approach over state-of-the-art approaches for semantic annotation of real data (183 human-annotated tables obtained from the UCI Machine Learning Repository). |
| Researcher Affiliation | Collaboration | Kunihiro Takeoka NEC Corporation EMAIL Masafumi Oyamada NEC Corporation EMAIL Shinji Nakadai NEC Corporation EMAIL Takeshi Okadome Kwansei Gakuin University EMAIL |
| Pseudocode | Yes | Algorithm 1 Approximate prediction with Gibbs sampling |
| Open Source Code | No | The paper does not provide any explicit statement about releasing open-source code or a link to a code repository. |
| Open Datasets | Yes | The dataset we used consists of 183 human-annotated tables (with 781 NE-columns and 4,109 literal-columns) obtained from the UCI Machine Learning repository (Dua and Karra Taniskidou 2017). |
| Dataset Splits | No | The paper mentions using a 'training dataset' for optimizing parameters and for evaluation, but it does not specify any particular data splits (e.g., 80/10/10) for training, validation, and testing. It refers to 'human-annotated 183 tables'. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., CPU models, GPU types, memory). |
| Software Dependencies | No | The paper mentions software components like 'Poincarรฉ embedding' and 'random forest classifiers' but does not specify their version numbers or other software dependencies with versions required for replication. |
| Experiment Setup | Yes | We set the number of iterations in Gibbs sampling to 300 because we observed the convergence at that point and further iterations did not affect the accuracy of the model. |