Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Bootstrapping Self-Improvement of Language Model Programs for Zero-Shot Schema Matching
Authors: Nabeel Seedat, Mihaela Van Der Schaar
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches, highlighting its potential to accelerate data integration and interoperability of ML-ready data. We conduct experiments on the MIMIC-OMOP and Synthea-OMOP datasets, which are the standard benchmark datasets used in prior schema matching works (Sheetrit et al., 2024; Zhang et al., 2023b; Narayan et al., 2022; Zhang et al., 2023a; 2021). These datasets are real-world healthcare schema matching datasets and have been widely adopted due to their complexity and their reflection of real-world schema matching challenges. |
| Researcher Affiliation | Collaboration | Nabeel Seedat 1 2 Mihaela van der Schaar 1 1Department of Applied Mathematics and Theoretical Physics, University of Cambridge 2Foundational Machine Learning Research, Thomson Reuters. Correspondence to: Nabeel Seedat <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Optimize LM program L 0: Input: Set of evaluation queries Deval = e1, e2, . . . , en 0: Output: Set of top n demonstrations Ddemo ... Algorithm 3 Matchmaker: Schema Matching with Self-Improving Compositional Language Model Programs Require: Source schema Ss, Target schema St Ensure: Schema matches M |
| Open Source Code | Yes | 2https://github.com/seedatnabeel/Matchmaker or https://github.com/vanderschaarlab/Matchmaker |
| Open Datasets | Yes | We conduct experiments on the MIMIC-OMOP and Synthea-OMOP datasets, which are the standard benchmark datasets used in prior schema matching works (Sheetrit et al., 2024; Zhang et al., 2023b; Narayan et al., 2022; Zhang et al., 2023a; 2021). ... Open-source data: https://github.com/meni Data1/MIMIC_2_OMOP ... Open-source data: https://github.com/JZCS2018/SMAT/tree/main/datasets/omap/ |
| Dataset Splits | Yes | Note there is no specific train-test sets used as in supervised learning. As we perform the schema matching task in a zero-shot manner. ... In our experiments, we assess two variants given that labeled training data for schema matching is hard to access: (i) 20-80: 20% train and 80% test and (ii) 50-50: 50% train and 50% test. |
| Hardware Specification | Yes | All experiments are run on a single Nvidia A4000 GPU with 20 GB of vram. |
| Software Dependencies | Yes | The model version used as the LLM was GPT-4-1106, with the following settings: ... We use Colbert-V2 (Santhanam et al., 2022) as the embedding model ... All LLM baselines use GPT-4 (0613) (Open AI, 2023) as the backbone for fair comparison to the original works and to isolate the gains of the system not tied to the LLM. |
| Experiment Setup | Yes | GPT-4 Hyper-parameters. The model version used as the LLM was GPT-4-1106, with the following settings: { temperature : 0.5, max_tokens : 1024, top_p : 1, frequency_penalty : 0, presence_penalty : 0, n : 1, } |