A Discriminative Latent Variable Model for Online Clustering
Authors: Rajhans Samdani, Kai-Wei Chang, Dan Roth
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments on coreference resolution and document clustering, L3M outperforms several existing online as well as batch supervised clustering techniques. We present experiments on coreference resolution and document clustering. |
| Researcher Affiliation | Collaboration | Rajhans Samdani, Google Research RAJHANS@GOOGLE.COM Kai-Wei Chang, University of Illinois KCHANG10@ILLINOIS.EDU Dan Roth, University of Illinois DANR@ILLINOIS.EDU |
| Pseudocode | No | The paper describes algorithms but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. |
| Open Datasets | Yes | We show experimental results on two benchmark English coreference datasets ACE 2004 (NIST, 2004) and Ontonotes-5.0 (Pradhan et al., 2012). |
| Dataset Splits | Yes | ACE 2004 data contains 442 documents, split into 268 training, 68 development, and 106 testing documents... Onto Notes-5.0 (Pradhan et al., 2012) is the largest annotated corpus on coreference with a total of 3,145 training documents and 348 testing documents. We use 343 documents from the training set for validation. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using structural SVMs and an ILP solver, but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | For all the algorithms, we tune the regularization parameters (and also γ for L3M) to optimize the targeted evaluation metric on the development set. For all the online clustering techniques (Sum-Link, Bin.-Left-Link, L3M), we present results with a single pass over the data as well as with multiple number of passes tuned on a validation set... For L3M (tuned γ), the best value of γ for ACE 2004 for one pass was 0; with multiple passes, the best γ was 0.2. ... it took five passes to acheive top performance on the development set for both the datasets and for all the online algorithms. |