Neural Models for Output-Space Invariance in Combinatorial Problems
Authors: Yatin Nandwani, Vidit Jain, Mausam ., Parag Singla
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive experimental evaluation on puzzles generated from three different structured CSPs: Graph Coloring (GCP), Futoshiki, and Sudoku. We compare two of our models with an NLM (Dong et al., 2019) baseline a generic neural reasoner, which either fails to scale or performs significantly worse for most test sizes used in our experiments. We also compare our two models along the axes of performance and scalability and discuss their strengths and weaknesses. |
| Researcher Affiliation | Academia | Yatin Nandwani , Vidit Jain , Mausam & Parag Singla Department of Computer Science, Indian Institute of Technology Delhi, INDIA {yatin.nandwani, vidit.jain.cs117, mausam, parags}@cse.iitd.ac.in |
| Pseudocode | No | The paper describes the message passing rules and state update functions in text and refers to Figure 3 for an illustration, but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We also make our code publicly available at https://github.com/dair-iitd/output-space-invariance. |
| Open Datasets | Yes | The training data consists of 10 thousand 9 x 9 puzzles randomly selected from the dataset introduced in (Palm et al., 2018). Our test set has k k puzzles, with k {10, 12, 15, 16}. Data generation process is similar to Futoshiki, with the distribution of missing cells varying between 30 68% depending on the board size. Instead of backtracking, solution validity is checked through the GSS library (Pieters, 2019). Please see appendix for more details on data generation process for all three tasks. (Footnote 4) Available at: https://data.dgl.ai/dataset/sudoku-hard.zip |
| Dataset Splits | No | The paper mentions a "devset" for hyperparameter selection but does not specify the size, percentages, or method of creating validation splits from the main datasets. For instance, it does not state |
| Hardware Specification | Yes | All models are trained on K40 GPU nodes with 12GB memory. |
| Software Dependencies | No | The paper mentions using the Adam optimizer and an LSTM cell, but does not specify version numbers for these or any other software libraries, frameworks (like PyTorch or TensorFlow), or programming languages used. |
| Experiment Setup | Yes | Batch Size: For each task, we selected the maximum batch size that can be accommodated in 12GB GPU memory. Refer to Table 8 for details. Optimizer: To minimize the loss, we use Adam optimizer with learning rate 0.0002. As in the original RRN paper, we chose a weight decay factor of 1E-4. Orthogonality Loss Factor: To ensure that the multi-valued model learns different embeddings for each value in the value-set, we add an auxiliary loss term, corresponding to the total pairwise dot product of any two embeddings, before and after message passing on the Relation Message Passing Graph (RMPG), G. Its weight, α, was chosen amongst {0.01, 0.1, 0.5} by cross validating on a devset for Futoshiki, and then fixed afterwards for all our experiments. Edge Dropout: While collating the messages from the edges of the same type, we drop 10% of the messages, as done in RRN. Dropout is used in the Message Passing Graph (MPG) of the binarized model, and the Constraint Message Passing Graph (CMPG) of the multi-valued model. |