Learning Maximum-A-Posteriori Perturbation Models for Structured Prediction in Polynomial Time

Authors: Asish Ghoshal, Jean Honorio

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate our proposed method (CRF_RAND) on synthetic data against three other methods: CRF_ALL, SVM_RAND, and SVM. ... Figure 1 shows the training and test set errors and the training time of the four different algorithms.
Researcher Affiliation Academia 1Department of Computer Science, Purdue University, West Lafayette, IN 47906. Correspondence to: Asish Ghoshal <aghoshal@purdue.edu>, Jean Honorio <jhonorio@purdue.edu>.
Pseudocode Yes Algorithm 1 An example algorithm implementing a proposal distribution that depends on yi 2 S. 1: Input: Weight vector w 2 Rd,s, (xi, yi) 2 S, parame- ter 2 [0, 1] and k 1. 2: Output: A structured output y 2 Y(x). 3: With probability pick y0 uniformly at random from Y(xi), and with probability 1 set y0 to yi. 4: y y0. 5: for y0 2 neighborsk(y) do 6: if hφ(x, y0), wi hφ(x, y), wi then 7: y y0. 8: end if 9: end for 10: Return y.
Open Source Code No The paper does not provide any explicit statement about making its source code publicly available or a link to a code repository.
Open Datasets No We generate a ground truth parameter w 2 Rd with random entries sampled independently from a zero mean Gaussian distribution with variance 100. We then randomly set all but s = d entries to be zero. We then generate a training set of S of 100 samples.
Dataset Splits No The paper describes generating a training set and a test set but does not explicitly mention a separate validation set or specify training/validation/test splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies, libraries, or solvers used in the experiments.
Experiment Setup Yes We set the 1 regularization parameter to be 0.01 for all methods. We used 20 iterations of gradient descent with step size of 1/ t for all algorithms, where t is the iteration, to learn the parameter w for both the exact method and our randomized algorithm. In order to simplify gradient calculations, we simply set β = 1/ log((r 1))(pm 1)) during training. For CRF_RAND, we used Algorithm 1 with = kwk1/pm and invoke the algorithm pm number of times to generate the set Ti for each i 2 [m] and w. This results in n = |Ti| pm.