Assessing SATNet's Ability to Solve the Symbol Grounding Problem

Authors: Oscar Chang, Lampros Flokas, Hod Lipson, Michael Spranger

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We re-ran the Sudoku experiments using the SATNet authors open-sourced implementation with identical experimental settings, but over 10 different random seeds to get standard error confidence intervals. Table 1 shows clearly that output masking does not affect the results in the non-visual case, but causes SATNet to fail completely for visual Sudoku, which is what we expect from the discussion in the previous section. Once the intermediate labels are gone, the CNN does not ever learn to classify the digits better than chance. SATNet s failure at symbol grounding directly leads to its failure at the overall visual Sudoku task.
Researcher Affiliation Collaboration Oscar Chang, Lampros Flokas, Hod Lipson Data Science Institute Columbia University {oscar.chang,lampros.flokas,hod.lipson}@columbia.edu Michael Spranger Sony AI michael.spranger@sony.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions using 'the SATNet authors open-sourced implementation' but does not provide concrete access to source code for the methodology described in *this* paper.
Open Datasets Yes In visual Sudoku, the inputs are now 81 images of digits (taken from the MNIST dataset)
Dataset Splits No The paper mentions '9000 training and 1000 test examples' but does not specify a validation dataset split or percentages for training/validation/test.
Hardware Specification No The paper does not provide specific hardware details used for running its experiments.
Software Dependencies No The paper mentions 'Adam [42]' as an optimizer but does not specify version numbers for Adam or any other software dependencies.
Experiment Setup Yes We present four empirical findings using experiments on the MNIST mapping problem. All experiments were ran for 50 training epochs over 10 random seeds to get standard error confidence intervals. The Sudoku CNN, which was the backbone architecture used in the SATNet author s visual Sudoku implementation, is used throughout unless stated otherwise. We evaluate the results by presenting test accuracies with their confidence intervals and the number of complete failures in parentheses.