Bongard-LOGO: A New Benchmark for Human-Level Concept Learning and Reasoning
Authors: Weili Nie, Zhiding Yu, Lei Mao, Ankit B. Patel, Yuke Zhu, Anima Anandkumar
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, we show that the state-of-the-art deep learning methods perform substantially worse than human subjects, implying that they fail to capture core human cognition properties. We split the 12,000 problems in BONGARD-LOGO into the disjoint train/validation/test sets, consisting of 9300, 900, and 1800 problems respectively. We report the test accuracy (Acc) of different methods on each of the four test sets respectively, and compare the results to the human performance in Table 1. |
| Researcher Affiliation | Collaboration | Weili Nie Rice University wn8@rice.edu Zhiding Yu NVIDIA zhidingy@nvidia.com Lei Mao NVIDIA lmao@nvidia.com Ankit B. Patel Rice University Baylor College of Medicine abp4@rice.edu Yuke Zhu UT Austin NVIDIA yukez@cs.utexas.edu Animashree Anandkumar Caltech NVIDIA anima@caltech.edu |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | We have open-sourced the procedural generation code and data of BONGARD-LOGO in the following Git Hub repository: https://github.com/NVlabs/Bongard-LOGO. |
| Open Datasets | Yes | We developed the BONGARD-LOGO benchmark that shares the same purposes as the original BPs for human-level visual concept learning and reasoning. Meanwhile, it contains a large quantity of 12,000 problems and transforms concept learning into a few-shot binary classification problem. We have open-sourced the procedural generation code and data of BONGARD-LOGO in the following Git Hub repository: https://github.com/NVlabs/Bongard-LOGO. |
| Dataset Splits | Yes | We split the 12,000 problems in BONGARD-LOGO into the disjoint train/validation/test sets, consisting of 9300, 900, and 1800 problems respectively. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | No | The paper states: 'We put the experiment setup for training these methods to Appendix C'. This means the detailed setup is not in the main text. |