On Parameter Tying by Quantization

Authors: Li Chou, Somdeb Sarkhel, Nicholas Ruozzi, Vibhav Gogate

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide and prove error bounds for our new technique and demonstrate experimentally that it often yields models having higher test-set log-likelihood than the ones learned using the MLE. We also propose a new importance sampling algorithm for fast approximate inference in models having several tied parameters. Our experiments show that our new inference algorithm is superior to existing approaches such as Gibbs sampling and MCSAT on models having tied parameters, learned using our quantization-based approach. Experiments We evaluated the performance of our quantized approach on both learning and inference tasks using several publicly available benchmark datasets from the UAI 2008 probabilistic inference competition repository (http://graphmod.ics.uci.edu/uai08).
Researcher Affiliation Academia Li Chou, Somdeb Sarkhel Department of Computer Science The University of Texas at Dallas {lkc130030,sxs104721}@utdallas.edu Nicholas Ruozzi Department of Computer Science The University of Texas at Dallas nicholas.ruozzi@utdallas.edu Vibhav Gogate Department of Computer Science The University of Texas at Dallas vgogate@hlt.utdallas.edu
Pseudocode Yes Algorithm 1 Tied Weight Importance Sampling Input: A log-linear model M = X, F , μ with k unique weights, Number of samples N Output: Importance weighted samples 1: Create one super-feature Gi for each parameter μi 2: Construct a proposal distribution Q(G) over the superfeatures 3: for s = 1 to N do 4: S = ; w(s) = 1 5: for i = 1 to k do 6: ji Q(Gi|G1, . . . , Gi 1) 7: Add ji randomly selected features from Gi to S 8: Add the negation of the features from Gi not selected in the previous step to S 9: w(s) = w(s) |Gi| ji exp(jiμi) Q(Gi|G1,...,Gi 1) 10: end for 11: Sample x(s) USAT (S) 12: w(s) = w(s) #S 13: end for 14: return ( x(s), w(s)) for s = 1 to N
Open Source Code No The paper does not provide any specific links or statements about making their code open-source. It only references the "Alchemy system (Kok et al. 2006)" which is a third-party tool.
Open Datasets Yes We evaluated the performance of our quantized approach on both learning and inference tasks using several publicly available benchmark datasets from the UAI 2008 probabilistic inference competition repository (http://graphmod.ics.uci.edu/uai08).
Dataset Splits Yes For each selected Bayesian network, we used forward sampling to generate 100 sets of 6,000 training, 2,000 validation and 2,000 test data points.
Hardware Specification Yes All experiments were performed on quad-core Intel i7 based machines with 16GB of RAM running Ubuntu.
Software Dependencies No The paper mentions "running Ubuntu" but does not specify version numbers for any other key software components, libraries, or frameworks.
Experiment Setup No The paper mentions running algorithms for "500 seconds" but does not specify hyperparameters (e.g., learning rate, batch size, number of epochs, optimizer settings) or other specific training configurations.