Learning Maximum-A-Posteriori Perturbation Models for Structured Prediction in Polynomial Time
Authors: Asish Ghoshal, Jean Honorio
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate our proposed method (CRF_RAND) on synthetic data against three other methods: CRF_ALL, SVM_RAND, and SVM. ... Figure 1 shows the training and test set errors and the training time of the four different algorithms. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Purdue University, West Lafayette, IN 47906. Correspondence to: Asish Ghoshal <aghoshal@purdue.edu>, Jean Honorio <jhonorio@purdue.edu>. |
| Pseudocode | Yes | Algorithm 1 An example algorithm implementing a proposal distribution that depends on yi 2 S. 1: Input: Weight vector w 2 Rd,s, (xi, yi) 2 S, parame- ter 2 [0, 1] and k 1. 2: Output: A structured output y 2 Y(x). 3: With probability pick y0 uniformly at random from Y(xi), and with probability 1 set y0 to yi. 4: y y0. 5: for y0 2 neighborsk(y) do 6: if hφ(x, y0), wi hφ(x, y), wi then 7: y y0. 8: end if 9: end for 10: Return y. |
| Open Source Code | No | The paper does not provide any explicit statement about making its source code publicly available or a link to a code repository. |
| Open Datasets | No | We generate a ground truth parameter w 2 Rd with random entries sampled independently from a zero mean Gaussian distribution with variance 100. We then randomly set all but s = d entries to be zero. We then generate a training set of S of 100 samples. |
| Dataset Splits | No | The paper describes generating a training set and a test set but does not explicitly mention a separate validation set or specify training/validation/test splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, libraries, or solvers used in the experiments. |
| Experiment Setup | Yes | We set the 1 regularization parameter to be 0.01 for all methods. We used 20 iterations of gradient descent with step size of 1/ t for all algorithms, where t is the iteration, to learn the parameter w for both the exact method and our randomized algorithm. In order to simplify gradient calculations, we simply set β = 1/ log((r 1))(pm 1)) during training. For CRF_RAND, we used Algorithm 1 with = kwk1/pm and invoke the algorithm pm number of times to generate the set Ti for each i 2 [m] and w. This results in n = |Ti| pm. |