Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

An Iterative Approach to Synthesize Data Transformation Programs

Authors: Bo Wu, Craig A. Knoblock

IJCAI 2015 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated the approach with a variety of transformation scenarios. The results show that the approach significantly reduces the time used to generate the transformation programs, especially in complicated scenarios.
Researcher Affiliation Academia Bo Wu Computer Science Department University of Southern California Los Angeles, California EMAIL Craig A. Knoblock Information Science Institute University of Southern California Los Angeles, California EMAIL
Pseudocode Yes Algorithm 1: Program Adaptation
Open Source Code Yes Data and code are available at http://bit.ly/1Gt Z4Gc. The code is also available as the data transformation tool of Karma (http://www.isi.edu/integration/karma).
Open Datasets Yes Data and code are available at http://bit.ly/1Gt Z4Gc.
Dataset Splits No The paper mentions providing examples iteratively until programs transform all records correctly but does not specify any training/validation/test splits.
Hardware Specification Yes We performed the experiments on a laptop with 8G RAM and 2.66GHz CPU.
Software Dependencies No The paper mentions using Karma, Gulwani's approach, Metagol DF, and Flashfill, but does not specify version numbers for any key software dependencies.
Experiment Setup No The paper describes the comparison methodology and how time was measured for program generation, but it does not provide specific hyperparameters or system-level training settings like learning rates, batch sizes, or optimizer configurations, as might be found in typical machine learning setups.