Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
An Iterative Approach to Synthesize Data Transformation Programs
Authors: Bo Wu, Craig A. Knoblock
IJCAI 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated the approach with a variety of transformation scenarios. The results show that the approach significantly reduces the time used to generate the transformation programs, especially in complicated scenarios. |
| Researcher Affiliation | Academia | Bo Wu Computer Science Department University of Southern California Los Angeles, California EMAIL Craig A. Knoblock Information Science Institute University of Southern California Los Angeles, California EMAIL |
| Pseudocode | Yes | Algorithm 1: Program Adaptation |
| Open Source Code | Yes | Data and code are available at http://bit.ly/1Gt Z4Gc. The code is also available as the data transformation tool of Karma (http://www.isi.edu/integration/karma). |
| Open Datasets | Yes | Data and code are available at http://bit.ly/1Gt Z4Gc. |
| Dataset Splits | No | The paper mentions providing examples iteratively until programs transform all records correctly but does not specify any training/validation/test splits. |
| Hardware Specification | Yes | We performed the experiments on a laptop with 8G RAM and 2.66GHz CPU. |
| Software Dependencies | No | The paper mentions using Karma, Gulwani's approach, Metagol DF, and Flashfill, but does not specify version numbers for any key software dependencies. |
| Experiment Setup | No | The paper describes the comparison methodology and how time was measured for program generation, but it does not provide specific hyperparameters or system-level training settings like learning rates, batch sizes, or optimizer configurations, as might be found in typical machine learning setups. |