Deep Reasoning Networks for Unsupervised Pattern De-mixing with Constraint Reasoning

Authors: Di Chen, Yiwei Bai, Wenting Zhao, Sebastian Ament, John Gregoire, Carla Gomes

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate the power of DRNets on two pattern de-mixing tasks disentangling two overlapping hand-written Sudokus (Multi-MNIST-Sudoku) and inferring crystal structures of materials from X-ray diffraction data (Crystal-Structure Phase-Mapping). All the experiments are performed on one NVIDIA Tesla V100 GPU with 16GB memory. We demonstrate the potential of DRNets on two de-mixing tasks with detailed experimental results.
Researcher Affiliation Academia 1Department of Computer Science, Cornell University, Ithaca, New York, USA 2California Institute of Technology, Pasadena, California, USA. Correspondence to: Di Chen <di@cs.cornell.edu>.
Pseudocode Yes Algorithm 1 Constraint-aware stochastic gradient descent optimization of deep reasoning networks. Input: (i) Data points {xi}N i=1. (ii) Constraint graph. (iii) Penalty functions ψl( ) and ψg j ( ) for the local and the global constraints. (iv) Pre-trained or parametric generative decoder G( ). 1: Initialize the penalty weights λl, λg j and thresholds for all constraints. 2: for number of optimization iterations do 3: Batch data points {x1, ..., xm} from the randomly sampled (maximal) connected components. 4: Collect the global penalty functions {ψg j ( )}M j=1 concerning those data points. 5: Compute the latent space {φθ(x1), ..., φθ(xm)} from the encoder. 6: Adjust the penalty weights λl, λg j and thresholds accordingly. 7: minimize 1 m Pm i=1 L(G(φθ(xi)), xi) + λlψl(φθ(xi)) + PM j=1 λg jψg j ({φθ(xk)|k Sj}) using any standard gradient-based optimization method and update the parameters θ. 8: end for
Open Source Code No The paper does not provide an explicit statement or a link indicating the availability of open-source code for the described methodology.
Open Datasets Yes Multi-MNIST-Sudoku: We generated 160,000 input data points for each training set, validation set and test set, where each data point corresponds to a 32x32 image of overlapping digits coming from MNIST (Le Cun et al., 1998) and every 16 data points form a 4-by-4 overlapping Sudokus.
Dataset Splits Yes We generated 160,000 input data points for each training set, validation set and test set, where each data point corresponds to a 32x32 image of overlapping digits coming from MNIST (Le Cun et al., 1998) and every 16 data points form a 4-by-4 overlapping Sudokus.
Hardware Specification Yes All the experiments are performed on one NVIDIA Tesla V100 GPU with 16GB memory.
Software Dependencies No The paper mentions using Adam optimizer but does not provide specific version numbers for software dependencies like deep learning frameworks or libraries.
Experiment Setup Yes For the training process of our DRNets, we select a learning rate in {0.0001, 0.0005, 0.001} with Adam optimizer (Kingma & Ba, 2014), for all the experiments. [...] The reasoning loss enforces the Sudoku rules and includes the continuous relaxation of the cardinality (2 16 cells) and All-Different (2 (4 rows + 4 columns + 4 boxes)) constraints for every 16 data points, with initial weights of 0.01 and 1.0, respectively. [...] In this task, we used the Jensen Shannon distance (JS distance) with a weight of 20.0 plus the L2-distance with a weight of 0.05 as the reconstruction loss. We use the JS distance since the location of peaks are the most important characteristics of a phase pattern and mismatching peaks would cause a large JS distance. [...] Due to the different noise level, we use different weights for Gibbs Rule (1.0 and 30.0) and Phase Field Connectivity (0.01 and 3.0) for Al-Li-Fe oxide system and Bi-Cu-V oxide system respectively.