On Parameter Tying by Quantization
Authors: Li Chou, Somdeb Sarkhel, Nicholas Ruozzi, Vibhav Gogate
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide and prove error bounds for our new technique and demonstrate experimentally that it often yields models having higher test-set log-likelihood than the ones learned using the MLE. We also propose a new importance sampling algorithm for fast approximate inference in models having several tied parameters. Our experiments show that our new inference algorithm is superior to existing approaches such as Gibbs sampling and MCSAT on models having tied parameters, learned using our quantization-based approach. Experiments We evaluated the performance of our quantized approach on both learning and inference tasks using several publicly available benchmark datasets from the UAI 2008 probabilistic inference competition repository (http://graphmod.ics.uci.edu/uai08). |
| Researcher Affiliation | Academia | Li Chou, Somdeb Sarkhel Department of Computer Science The University of Texas at Dallas {lkc130030,sxs104721}@utdallas.edu Nicholas Ruozzi Department of Computer Science The University of Texas at Dallas nicholas.ruozzi@utdallas.edu Vibhav Gogate Department of Computer Science The University of Texas at Dallas vgogate@hlt.utdallas.edu |
| Pseudocode | Yes | Algorithm 1 Tied Weight Importance Sampling Input: A log-linear model M = X, F , μ with k unique weights, Number of samples N Output: Importance weighted samples 1: Create one super-feature Gi for each parameter μi 2: Construct a proposal distribution Q(G) over the superfeatures 3: for s = 1 to N do 4: S = ; w(s) = 1 5: for i = 1 to k do 6: ji Q(Gi|G1, . . . , Gi 1) 7: Add ji randomly selected features from Gi to S 8: Add the negation of the features from Gi not selected in the previous step to S 9: w(s) = w(s) |Gi| ji exp(jiμi) Q(Gi|G1,...,Gi 1) 10: end for 11: Sample x(s) USAT (S) 12: w(s) = w(s) #S 13: end for 14: return ( x(s), w(s)) for s = 1 to N |
| Open Source Code | No | The paper does not provide any specific links or statements about making their code open-source. It only references the "Alchemy system (Kok et al. 2006)" which is a third-party tool. |
| Open Datasets | Yes | We evaluated the performance of our quantized approach on both learning and inference tasks using several publicly available benchmark datasets from the UAI 2008 probabilistic inference competition repository (http://graphmod.ics.uci.edu/uai08). |
| Dataset Splits | Yes | For each selected Bayesian network, we used forward sampling to generate 100 sets of 6,000 training, 2,000 validation and 2,000 test data points. |
| Hardware Specification | Yes | All experiments were performed on quad-core Intel i7 based machines with 16GB of RAM running Ubuntu. |
| Software Dependencies | No | The paper mentions "running Ubuntu" but does not specify version numbers for any other key software components, libraries, or frameworks. |
| Experiment Setup | No | The paper mentions running algorithms for "500 seconds" but does not specify hyperparameters (e.g., learning rate, batch size, number of epochs, optimizer settings) or other specific training configurations. |