Learning to Solve Constraint Satisfaction Problems with Recurrent Transformer
Authors: Zhun Yang, Adam Ishay, Joohyung Lee
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS WITH RECURRENT TRANSFORMER, Table 1: Whole board accuracy on different Sudoku datasets., 4.1.1 ABLATION STUDY ON MODEL DESIGN (LXRYHZ) WITH TEXTUAL SUDOKU, 5.2 EXPERIMENTS ON INJECTING LOGICAL CONSTRAINTS IN RECURRENT TRANSFORMER TRAINING |
| Researcher Affiliation | Collaboration | Zhun Yang1, Adam Ishay1 & Joohyung Lee1,2, 1School of Computing and AI, Arizona State University, AZ, USA 2Global AI Center, Samsung Research, S. Korea |
| Pseudocode | No | The paper provides detailed mathematical formulations for the Recurrent Transformer architecture and describes its components, but it does not include a distinct block labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | 1The code is available at https://github.com/azreasoners/recurrent_transformer. |
| Open Datasets | Yes | For textual Sudoku, we use the SATNet dataset from (Wang et al., 2019) and the RRN dataset from (Palm et al., 2018). For visual Sudoku, we use the ungrounded SATNet-V dataset from (Topan et al., 2021). In addition to SATNet-V, we created a new ungrounded dataset, RRN-V, following the same procedure based on the RRN data set. MNIST. We use MNIST images (Le Cun et al., 1998) (http://yann.lecun.com/exdb/ mnist/) |
| Dataset Splits | Yes | We use the shortest path dataset SP4 from (Xu et al., 2018)...we split the dataset into 60%/20%/20% training/test/validation examples. |
| Hardware Specification | Yes | All of our experiments were done on Ubuntu 18.04.2 LTS with two 10-cores CPU Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz and four GP104 [Ge Force GTX 1080]. |
| Software Dependencies | No | The paper mentions the operating system ('Ubuntu 18.04.2 LTS') and that their implementation is 'based on Andrej Karpathy s min GPT repository'. However, it does not specify version numbers for other key software dependencies such as programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or specific libraries. |
| Experiment Setup | Yes | F.2 TRAINING DETAILS, The values of the weights α and β of the constraint losses Lsudoku and Lattention are selected from {0, 0.1, 0.5, 1} to achieve the highest training accuracy. Table 8: Model Structure and Hyperparameters for Textual Sudoku Experiments (Batch size, Learning rate, Dropout, Number of attention heads, Number of layers, Number of recurrences, Embedding dimension, Token Embedder, Sequence length) |