Assessing SATNet's Ability to Solve the Symbol Grounding Problem
Authors: Oscar Chang, Lampros Flokas, Hod Lipson, Michael Spranger
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We re-ran the Sudoku experiments using the SATNet authors open-sourced implementation with identical experimental settings, but over 10 different random seeds to get standard error confidence intervals. Table 1 shows clearly that output masking does not affect the results in the non-visual case, but causes SATNet to fail completely for visual Sudoku, which is what we expect from the discussion in the previous section. Once the intermediate labels are gone, the CNN does not ever learn to classify the digits better than chance. SATNet s failure at symbol grounding directly leads to its failure at the overall visual Sudoku task. |
| Researcher Affiliation | Collaboration | Oscar Chang, Lampros Flokas, Hod Lipson Data Science Institute Columbia University {oscar.chang,lampros.flokas,hod.lipson}@columbia.edu Michael Spranger Sony AI michael.spranger@sony.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using 'the SATNet authors open-sourced implementation' but does not provide concrete access to source code for the methodology described in *this* paper. |
| Open Datasets | Yes | In visual Sudoku, the inputs are now 81 images of digits (taken from the MNIST dataset) |
| Dataset Splits | No | The paper mentions '9000 training and 1000 test examples' but does not specify a validation dataset split or percentages for training/validation/test. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Adam [42]' as an optimizer but does not specify version numbers for Adam or any other software dependencies. |
| Experiment Setup | Yes | We present four empirical findings using experiments on the MNIST mapping problem. All experiments were ran for 50 training epochs over 10 random seeds to get standard error confidence intervals. The Sudoku CNN, which was the backbone architecture used in the SATNet author s visual Sudoku implementation, is used throughout unless stated otherwise. We evaluate the results by presenting test accuracies with their confidence intervals and the number of complete failures in parentheses. |