Learning Fast-Mixing Models for Structured Prediction
Authors: Jacob Steinhardt, Percy Liang
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show empirical improvements on two inference tasks. We evaluated our methods on two tasks: (i) inferring words from finger gestures on a touch screen and (ii) inferring DNF formulas for program verification. |
| Researcher Affiliation | Academia | Jacob Steinhardt JSTEINHARDT@CS.STANFORD.EDU Percy Liang PLIANG@STANFORD.EDU Stanford University, 353 Serra Street, Stanford, CA 94305 USA |
| Pseudocode | Yes | Algorithm 1 Algorithm for computing an estimate of θ log pθ(y | x). |
| Open Source Code | Yes | Code, data, and experiments for this paper are available on the Coda Lab platform at https://www.codalab.org/worksheets/ 0xc6edf0c9bec643ac9e74418bd6ad4136/. |
| Open Datasets | Yes | We generated the data by sampling words from the New York Times corpus (Sandhaus, 2008). |
| Dataset Splits | No | The paper uses the New York Times corpus but does not explicitly state the training, validation, and test splits with percentages or counts, nor does it refer to predefined standard splits for this corpus. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for running the experiments (e.g., GPU/CPU models, memory, or specific computing environments). |
| Software Dependencies | No | The paper mentions using 'Ada Grad (Duchi et al., 2010)' as an online learning algorithm but does not provide specific software dependencies like programming languages, libraries, or solvers with version numbers. |
| Experiment Setup | Yes | All algorithms are trained with Ada Grad (Duchi et al., 2010) with 16 independent chains for each example. To provide a fair comparison of the methods, we set ϵ in the Doeblin sampler to the inverse of the number of transitions T, so that the expected number of transitions of all algorithms is the same. We also devoted the first half of each chain to burn-in. For the staged method, we initialize f uniformly at random, take Geometric(0.04) transitions based on a simplified cost function, then take Geometric(0.0002) steps with the full cost. |