Deep Reasoning Networks for Unsupervised Pattern De-mixing with Constraint Reasoning
Authors: Di Chen, Yiwei Bai, Wenting Zhao, Sebastian Ament, John Gregoire, Carla Gomes
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate the power of DRNets on two pattern de-mixing tasks disentangling two overlapping hand-written Sudokus (Multi-MNIST-Sudoku) and inferring crystal structures of materials from X-ray diffraction data (Crystal-Structure Phase-Mapping). All the experiments are performed on one NVIDIA Tesla V100 GPU with 16GB memory. We demonstrate the potential of DRNets on two de-mixing tasks with detailed experimental results. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Cornell University, Ithaca, New York, USA 2California Institute of Technology, Pasadena, California, USA. Correspondence to: Di Chen <di@cs.cornell.edu>. |
| Pseudocode | Yes | Algorithm 1 Constraint-aware stochastic gradient descent optimization of deep reasoning networks. Input: (i) Data points {xi}N i=1. (ii) Constraint graph. (iii) Penalty functions ψl( ) and ψg j ( ) for the local and the global constraints. (iv) Pre-trained or parametric generative decoder G( ). 1: Initialize the penalty weights λl, λg j and thresholds for all constraints. 2: for number of optimization iterations do 3: Batch data points {x1, ..., xm} from the randomly sampled (maximal) connected components. 4: Collect the global penalty functions {ψg j ( )}M j=1 concerning those data points. 5: Compute the latent space {φθ(x1), ..., φθ(xm)} from the encoder. 6: Adjust the penalty weights λl, λg j and thresholds accordingly. 7: minimize 1 m Pm i=1 L(G(φθ(xi)), xi) + λlψl(φθ(xi)) + PM j=1 λg jψg j ({φθ(xk)|k Sj}) using any standard gradient-based optimization method and update the parameters θ. 8: end for |
| Open Source Code | No | The paper does not provide an explicit statement or a link indicating the availability of open-source code for the described methodology. |
| Open Datasets | Yes | Multi-MNIST-Sudoku: We generated 160,000 input data points for each training set, validation set and test set, where each data point corresponds to a 32x32 image of overlapping digits coming from MNIST (Le Cun et al., 1998) and every 16 data points form a 4-by-4 overlapping Sudokus. |
| Dataset Splits | Yes | We generated 160,000 input data points for each training set, validation set and test set, where each data point corresponds to a 32x32 image of overlapping digits coming from MNIST (Le Cun et al., 1998) and every 16 data points form a 4-by-4 overlapping Sudokus. |
| Hardware Specification | Yes | All the experiments are performed on one NVIDIA Tesla V100 GPU with 16GB memory. |
| Software Dependencies | No | The paper mentions using Adam optimizer but does not provide specific version numbers for software dependencies like deep learning frameworks or libraries. |
| Experiment Setup | Yes | For the training process of our DRNets, we select a learning rate in {0.0001, 0.0005, 0.001} with Adam optimizer (Kingma & Ba, 2014), for all the experiments. [...] The reasoning loss enforces the Sudoku rules and includes the continuous relaxation of the cardinality (2 16 cells) and All-Different (2 (4 rows + 4 columns + 4 boxes)) constraints for every 16 data points, with initial weights of 0.01 and 1.0, respectively. [...] In this task, we used the Jensen Shannon distance (JS distance) with a weight of 20.0 plus the L2-distance with a weight of 0.05 as the reconstruction loss. We use the JS distance since the location of peaks are the most important characteristics of a phase pattern and mismatching peaks would cause a large JS distance. [...] Due to the different noise level, we use different weights for Gibbs Rule (1.0 and 30.0) and Phase Field Connectivity (0.01 and 3.0) for Al-Li-Fe oxide system and Bi-Cu-V oxide system respectively. |