reproducibilityindex.ai

Bongard-LOGO: A New Benchmark for Human-Level Concept Learning and Reasoning

Authors: Weili Nie, Zhiding Yu, Lei Mao, Ankit B. Patel, Yuke Zhu, Anima Anandkumar

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experiments, we show that the state-of-the-art deep learning methods perform substantially worse than human subjects, implying that they fail to capture core human cognition properties. We split the 12,000 problems in BONGARD-LOGO into the disjoint train/validation/test sets, consisting of 9300, 900, and 1800 problems respectively. We report the test accuracy (Acc) of different methods on each of the four test sets respectively, and compare the results to the human performance in Table 1.
Researcher Affiliation	Collaboration	Weili Nie Rice University wn8@rice.edu Zhiding Yu NVIDIA zhidingy@nvidia.com Lei Mao NVIDIA lmao@nvidia.com Ankit B. Patel Rice University Baylor College of Medicine abp4@rice.edu Yuke Zhu UT Austin NVIDIA yukez@cs.utexas.edu Animashree Anandkumar Caltech NVIDIA anima@caltech.edu
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	We have open-sourced the procedural generation code and data of BONGARD-LOGO in the following Git Hub repository: https://github.com/NVlabs/Bongard-LOGO.
Open Datasets	Yes	We developed the BONGARD-LOGO benchmark that shares the same purposes as the original BPs for human-level visual concept learning and reasoning. Meanwhile, it contains a large quantity of 12,000 problems and transforms concept learning into a few-shot binary classiﬁcation problem. We have open-sourced the procedural generation code and data of BONGARD-LOGO in the following Git Hub repository: https://github.com/NVlabs/Bongard-LOGO.
Dataset Splits	Yes	We split the 12,000 problems in BONGARD-LOGO into the disjoint train/validation/test sets, consisting of 9300, 900, and 1800 problems respectively.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	No	The paper states: 'We put the experiment setup for training these methods to Appendix C'. This means the detailed setup is not in the main text.