Learning Fast-Mixing Models for Structured Prediction

Authors: Jacob Steinhardt, Percy Liang

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show empirical improvements on two inference tasks. We evaluated our methods on two tasks: (i) inferring words from finger gestures on a touch screen and (ii) inferring DNF formulas for program verification.
Researcher Affiliation Academia Jacob Steinhardt JSTEINHARDT@CS.STANFORD.EDU Percy Liang PLIANG@STANFORD.EDU Stanford University, 353 Serra Street, Stanford, CA 94305 USA
Pseudocode Yes Algorithm 1 Algorithm for computing an estimate of θ log pθ(y | x).
Open Source Code Yes Code, data, and experiments for this paper are available on the Coda Lab platform at https://www.codalab.org/worksheets/ 0xc6edf0c9bec643ac9e74418bd6ad4136/.
Open Datasets Yes We generated the data by sampling words from the New York Times corpus (Sandhaus, 2008).
Dataset Splits No The paper uses the New York Times corpus but does not explicitly state the training, validation, and test splits with percentages or counts, nor does it refer to predefined standard splits for this corpus.
Hardware Specification No The paper does not provide specific details regarding the hardware used for running the experiments (e.g., GPU/CPU models, memory, or specific computing environments).
Software Dependencies No The paper mentions using 'Ada Grad (Duchi et al., 2010)' as an online learning algorithm but does not provide specific software dependencies like programming languages, libraries, or solvers with version numbers.
Experiment Setup Yes All algorithms are trained with Ada Grad (Duchi et al., 2010) with 16 independent chains for each example. To provide a fair comparison of the methods, we set ϵ in the Doeblin sampler to the inverse of the number of transitions T, so that the expected number of transitions of all algorithms is the same. We also devoted the first half of each chain to burn-in. For the staged method, we initialize f uniformly at random, take Geometric(0.04) transitions based on a simplified cost function, then take Geometric(0.0002) steps with the full cost.