reproducibilityindex.ai

Relational Programming with Foundational Models

Authors: Ziyang Li, Jiani Huang, Jason Liu, Felix Zhu, Eric Zhao, William Dodds, Neelay Velingker, Rajeev Alur, Mayur Naik

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate VIEIRA on 9 challenging tasks that span language, vision, and structured and vector databases. Our evaluation shows that programs in VIEIRA are concise, can incorporate modern foundation models, and have comparable or better accuracy than competitive baselines.
Researcher Affiliation	Academia	University of Pennsylvania liby99@seas.upenn.edu, jianih@seas.upenn.edu, jasonhl@seas.upenn.edu, zhufelix@seas.upenn.edu, zhaoer@seas.upenn.edu, wdodds@sas.upenn.edu, neelay@seas.upenn.edu, alur@seas.upenn.edu, mhnaik@seas.upenn.edu
Pseudocode	No	The paper provides code snippets demonstrating the VIEIRA language and its foreign interface, but it does not include formal pseudocode blocks or algorithms for its internal workings or experimental procedures.
Open Source Code	Yes	Our framework, plugin library, and evaluations are open-source and available at https://github.com/scalloplang/scallop.
Open Datasets	Yes	Table 1 lists the datasets used, many of which are well-known public benchmarks with cited sources: Hotpot QA (Yang et al. 2018), CLUTRR (Sinha et al. 2019), GSM8K (Cobbe et al. 2021), Amazon ESCI (Reddy et al. 2022), GQA (Hudson and Manning 2019), CLEVR (Johnson et al. 2016).
Dataset Splits	No	The paper does not explicitly provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or references to predefined splits) to reproduce the partitioning of the data.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU/GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions software components and models (e.g., Python, GPT, CLIP) but does not provide specific version numbers for these software dependencies required for reproducibility.
Experiment Setup	Yes	Our solution leverages GPT-4 (5-shot1) for extracting 3 relations: mentioned dates, duration between date labels, and the target date label. [...] Our solution for tracking shuffled objects relies on GPT-4 (1-shot) to extract 3 relations: initial possessions, swaps, and the target person whose final possessed object is expected as the answer. [...] Our solution to this task prompts GPT-4 (2-shot) to produce step-by-step expressions, which can contain constants, variables, and simple arithmetic operations.