Relational Programming with Foundational Models
Authors: Ziyang Li, Jiani Huang, Jason Liu, Felix Zhu, Eric Zhao, William Dodds, Neelay Velingker, Rajeev Alur, Mayur Naik
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate VIEIRA on 9 challenging tasks that span language, vision, and structured and vector databases. Our evaluation shows that programs in VIEIRA are concise, can incorporate modern foundation models, and have comparable or better accuracy than competitive baselines. |
| Researcher Affiliation | Academia | University of Pennsylvania liby99@seas.upenn.edu, jianih@seas.upenn.edu, jasonhl@seas.upenn.edu, zhufelix@seas.upenn.edu, zhaoer@seas.upenn.edu, wdodds@sas.upenn.edu, neelay@seas.upenn.edu, alur@seas.upenn.edu, mhnaik@seas.upenn.edu |
| Pseudocode | No | The paper provides code snippets demonstrating the VIEIRA language and its foreign interface, but it does not include formal pseudocode blocks or algorithms for its internal workings or experimental procedures. |
| Open Source Code | Yes | Our framework, plugin library, and evaluations are open-source and available at https://github.com/scalloplang/scallop. |
| Open Datasets | Yes | Table 1 lists the datasets used, many of which are well-known public benchmarks with cited sources: Hotpot QA (Yang et al. 2018), CLUTRR (Sinha et al. 2019), GSM8K (Cobbe et al. 2021), Amazon ESCI (Reddy et al. 2022), GQA (Hudson and Manning 2019), CLEVR (Johnson et al. 2016). |
| Dataset Splits | No | The paper does not explicitly provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or references to predefined splits) to reproduce the partitioning of the data. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU/GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions software components and models (e.g., Python, GPT, CLIP) but does not provide specific version numbers for these software dependencies required for reproducibility. |
| Experiment Setup | Yes | Our solution leverages GPT-4 (5-shot1) for extracting 3 relations: mentioned dates, duration between date labels, and the target date label. [...] Our solution for tracking shuffled objects relies on GPT-4 (1-shot) to extract 3 relations: initial possessions, swaps, and the target person whose final possessed object is expected as the answer. [...] Our solution to this task prompts GPT-4 (2-shot) to produce step-by-step expressions, which can contain constants, variables, and simple arithmetic operations. |