Relational Programming with Foundational Models

Authors: Ziyang Li, Jiani Huang, Jason Liu, Felix Zhu, Eric Zhao, William Dodds, Neelay Velingker, Rajeev Alur, Mayur Naik

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate VIEIRA on 9 challenging tasks that span language, vision, and structured and vector databases. Our evaluation shows that programs in VIEIRA are concise, can incorporate modern foundation models, and have comparable or better accuracy than competitive baselines.
Researcher Affiliation Academia University of Pennsylvania liby99@seas.upenn.edu, jianih@seas.upenn.edu, jasonhl@seas.upenn.edu, zhufelix@seas.upenn.edu, zhaoer@seas.upenn.edu, wdodds@sas.upenn.edu, neelay@seas.upenn.edu, alur@seas.upenn.edu, mhnaik@seas.upenn.edu
Pseudocode No The paper provides code snippets demonstrating the VIEIRA language and its foreign interface, but it does not include formal pseudocode blocks or algorithms for its internal workings or experimental procedures.
Open Source Code Yes Our framework, plugin library, and evaluations are open-source and available at https://github.com/scalloplang/scallop.
Open Datasets Yes Table 1 lists the datasets used, many of which are well-known public benchmarks with cited sources: Hotpot QA (Yang et al. 2018), CLUTRR (Sinha et al. 2019), GSM8K (Cobbe et al. 2021), Amazon ESCI (Reddy et al. 2022), GQA (Hudson and Manning 2019), CLEVR (Johnson et al. 2016).
Dataset Splits No The paper does not explicitly provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or references to predefined splits) to reproduce the partitioning of the data.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU/GPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions software components and models (e.g., Python, GPT, CLIP) but does not provide specific version numbers for these software dependencies required for reproducibility.
Experiment Setup Yes Our solution leverages GPT-4 (5-shot1) for extracting 3 relations: mentioned dates, duration between date labels, and the target date label. [...] Our solution for tracking shuffled objects relies on GPT-4 (1-shot) to extract 3 relations: initial possessions, swaps, and the target person whose final possessed object is expected as the answer. [...] Our solution to this task prompts GPT-4 (2-shot) to produce step-by-step expressions, which can contain constants, variables, and simple arithmetic operations.