Learning to Solve Constraint Satisfaction Problems with Recurrent Transformer

Authors: Zhun Yang, Adam Ishay, Joohyung Lee

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 EXPERIMENTS WITH RECURRENT TRANSFORMER, Table 1: Whole board accuracy on different Sudoku datasets., 4.1.1 ABLATION STUDY ON MODEL DESIGN (LXRYHZ) WITH TEXTUAL SUDOKU, 5.2 EXPERIMENTS ON INJECTING LOGICAL CONSTRAINTS IN RECURRENT TRANSFORMER TRAINING
Researcher Affiliation Collaboration Zhun Yang1, Adam Ishay1 & Joohyung Lee1,2, 1School of Computing and AI, Arizona State University, AZ, USA 2Global AI Center, Samsung Research, S. Korea
Pseudocode No The paper provides detailed mathematical formulations for the Recurrent Transformer architecture and describes its components, but it does not include a distinct block labeled 'Pseudocode' or 'Algorithm'.
Open Source Code Yes 1The code is available at https://github.com/azreasoners/recurrent_transformer.
Open Datasets Yes For textual Sudoku, we use the SATNet dataset from (Wang et al., 2019) and the RRN dataset from (Palm et al., 2018). For visual Sudoku, we use the ungrounded SATNet-V dataset from (Topan et al., 2021). In addition to SATNet-V, we created a new ungrounded dataset, RRN-V, following the same procedure based on the RRN data set. MNIST. We use MNIST images (Le Cun et al., 1998) (http://yann.lecun.com/exdb/ mnist/)
Dataset Splits Yes We use the shortest path dataset SP4 from (Xu et al., 2018)...we split the dataset into 60%/20%/20% training/test/validation examples.
Hardware Specification Yes All of our experiments were done on Ubuntu 18.04.2 LTS with two 10-cores CPU Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz and four GP104 [Ge Force GTX 1080].
Software Dependencies No The paper mentions the operating system ('Ubuntu 18.04.2 LTS') and that their implementation is 'based on Andrej Karpathy s min GPT repository'. However, it does not specify version numbers for other key software dependencies such as programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or specific libraries.
Experiment Setup Yes F.2 TRAINING DETAILS, The values of the weights α and β of the constraint losses Lsudoku and Lattention are selected from {0, 0.1, 0.5, 1} to achieve the highest training accuracy. Table 8: Model Structure and Hyperparameters for Textual Sudoku Experiments (Batch size, Learning rate, Dropout, Number of attention heads, Number of layers, Number of recurrences, Embedding dimension, Token Embedder, Sequence length)