Learning Mixtures of MLNs
Authors: Mohammad Islam, Somdeb Sarkhel, Deepak Venugopal
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results on several benchmarks show that our approach is much more scalable and accurate as compared to existing state-of-the-art MLN learning methods. Our experiments on several benchmarks taken from Alchemy (Kok et al. 2006) show that our approach is more scalable and accurate than Tuffy (Niu et al. 2011) and other state-of-the-art systems for MLNs. We evaluated our approach on three key aspects, (i) Solution Quality, (ii) Stability, and, (iii) Scalability. For evaluating solution quality, we have reported the cross validated test CLL score. For stability, we have reported the variance in weight as well as the variance in CLL and finally, to measure scalability we have reported the running time of competing approaches. |
| Researcher Affiliation | Collaboration | Mohammad Maminur Islam Department of Computer Science The University of Memphis mislam3@memphis.edu Somdeb Sarkhel Adobe Research sarkhel@adobe.com Deepak Venugopal Department of Computer Science The University of Memphis dvngopal@memphis.edu |
| Pseudocode | Yes | Algorithm 1: Learning the MLN Mixture |
| Open Source Code | No | The paper mentions that 'Both these systems [Tuffy and Magician] are available as open-source.' and provides a link for Magician: 'https://github.com/dvngp/CD-Learn'. However, it does not state that the code for their proposed 'Mixture MLN' approach is open-source or provide a link for it. |
| Open Datasets | Yes | We used three benchmarks from Alchemy, namely Web KB, Protein, and ER, to evaluate our approach. We compared our approach with Tuffy (Niu et al. 2011), the current state-of-the-art MLN system, and also Magician (Venugopal, S.Sarkhel, and Gogate 2016), which implements scalable versions of contrastive divergence (CD), voted perceptron (VP) and pseudo-log-likelihood maximisation (PLL), using approximate counting oracles (Sarkhel et al. 2016). Both these systems are available as open-source. We also tried to use Alchemy, an older MLN learning, and inference system, but it did not work with any of our datasets in our experiments since it ran out of memory during the grounding process. |
| Dataset Splits | Yes | For evaluating solution quality, we have reported the cross validated test CLL score. Table 1 shows our results where we compute CLL through 5-fold cross validation. Specifically, we divide the input data into v folds, ω1 . . . ωv. We learn all the weight-vectors Θ1 . . . Θk from v 1 folds and estimate the CLL on the remaining fold, and repeat this over all folds. To compute the W matrix, we divided the training data into 5 folds. |
| Hardware Specification | No | We implemented the mixture model by learning the components of the mixture in parallel. Specifically, we used a cluster of k 8GB quad-core machines for k components of the mixture, where we performed the learning using Tuffy, and computed the out-of-sample CLL using MCSAT. The description 'k 8GB quad-core machines' lacks specific processor models or types (e.g., Intel Xeon, specific generation) to be considered reproducible. |
| Software Dependencies | No | The paper mentions using Tuffy and Magician but does not specify their version numbers or any other software dependencies with version details. |
| Experiment Setup | Yes | For the mixture model, we set the number of clusters as 5% of the original domain-size and used the KMeans algorithm for clustering. We used five components in our mixture model. Magician (with the lowest possible ibound for the approximate counting oracle) is very fast for certain benchmarks but extremely slow for others (such as webkb). |