reproducibilityindex.ai

GENOME: Generative Neuro-Symbolic Visual Reasoning by Growing and Reusing Modules

Authors: Zhenfang Chen, Rui Sun, Wenjun Liu, Yining Hong, Chuang Gan

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 EXPERIMENTS In this section, we present a comprehensive series of experiments to evaluate the performance of our models. Initially, we demonstrate our models effectiveness in learning neural modules on two established benchmarks: GQA (Hudson & Manning, 2019), focusing on compositional visual question answering, and Ref COCO (Kazemzadeh et al., 2014), which assesses referring expression comprehension.
Researcher Affiliation	Collaboration	Zhenfang Chen MIT-IBM Watson AI Lab Rui Sun Columbia University Wenjun Liu Tsinghua University Yining Hong University of California, Los Angeles Chuang Gan MIT-IBM Watson AI Lab and UMass Amherst
Pseudocode	No	The paper presents several Python code snippets and program structures throughout the document (e.g., Figure 1, Figure 3, Figure 11-14), but none of these are explicitly labeled as "Pseudocode" or "Algorithm" blocks.
Open Source Code	No	The paper mentions a "Project page: https://vis-www.cs.umass.edu/genome" in a footnote. However, this is a general project overview page, not an explicit statement within the paper that the source code for the described methodology is being released or a direct link to a code repository.
Open Datasets	Yes	We show experiments of our GENOME on standard vision-language benchmarks, GQA (Hudson & Manning, 2019) and Ref COCO Kazemzadeh et al. (2014)... Additionally, with minimal training examples, GENOME demonstrates the capability to manage new visual reasoning tasks (Burke, 1985; Jiang et al., 2023a) by repurposing modules.
Dataset Splits	Yes	We extracted 300 examples from GQA, 100 from Ref COCO, 10 from Raven, and 10 from MEWL. ... We randomly chose 100 samples each from the choose, compare, and verify types. Altogether, these three types comprise 300 samples, all sourced from the GQA train split.
Hardware Specification	No	The paper includes a general acknowledgment of computational support: "We would also like to thank the computation support from Ai MOS, a server cluster for the IBM Research AI Hardware Center." However, it does not specify any particular hardware components such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper lists various models and APIs used, such as "gpt-3.5-turbo-instruct from Open AI and Wizard Coder-Python-34B-V1.0 from Wizard LM," and mentions general frameworks like "GLIP," "BLIP," "CLIP," "X-VLM," "Mi Da S," and "stable diffusion" along with their associated publications. However, it does not provide specific version numbers for ancillary software dependencies like Python, PyTorch, TensorFlow, or CUDA.
Experiment Setup	Yes	We only accept program candidates that achieve a pass rate surpassing a predefined threshold (η). This procedure bears resemblance to the code translation of LLMs discussed in (Chen et al., 2023a), but we extend it to accommodate more intricate multi-modal input types and instructions from natural language and raw images. ... We heuristically set the maximal number of debug iterations as 5.