Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Leveraging Online User Feedback to Improve Statistical Machine Translation
Authors: Lluís Formiga, Alberto Barrón-Cedeño, Lluís Màrquez, Carlos A. Henríquez, José B. Mariño
JAIR 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform a thorough evaluation on a real-world dataset collected from the Reverso.net translation service and show that every step in our methodology contributes significantly to improve a general purpose SMT system. Interestingly, the quality improvement is not only due to the increase of lexical coverage, but to a better lexical selection, reordering, and morphology. Finally, we show the robustness of the methodology by applying it to a different scenario, in which the new examples come from an automatically Web-crawled parallel corpus. Using exactly the same architecture and models provides again a significant improvement of the translation quality of a general purpose baseline SMT system. |
| Researcher Affiliation | Collaboration | Llu ıs Formiga EMAIL Verbio Technologies, S.L., Loreto, 44, 08029 Barcelona Alberto Barr on-Cede no EMAIL Llu ıs M arquez EMAIL Qatar Computing Research Institute Hamad Bin Khalifa University, Tornado Tower, Floor 10, P.O. Box 5825, Doha, Qatar Carlos A. Henr ıquez EMAIL Jos e B. Mari no EMAIL TALP Research Center Universitat Polit ecnica de Catalunya, Jordi Girona, 1-3, 08034 Barcelona |
| Pseudocode | Yes | Algorithm 1 Sim Ter. A pivot-based algorithm to align SRC and UE through TGT |
| Open Source Code | No | The paper references a third-party tool's repository: "Matecat (2015). Matecat official repository. https://github.com/matecat/Mate Cat. Accessed: 2015-07-24." However, there is no explicit statement or link indicating that the authors have made their *own* code for the described methodology publicly available. |
| Open Datasets | Yes | As training material we used the English Spanish Faust Feedback Filtering (FFF+) 2 corpus, developed within the FAUST EU project.It contains 550 examples of real translation requests and user-edits from the Reverso.net translation Web service. Available at ftp://mi.eng.cam.ac.uk/data/faust/UPC-Mar2013-FAUST-feedback-annotation.tgz. We selected different datasets for these experiments. In order to optimize the β parameters of the similarity function in Equation (1), we used the Europarl v6 corpus, EPPS (Koehn, 2005), to build a base phrase-based SMT system. In order to tune the α and λ parameters, and to validate the proposed methodology, we used the corpora from the WMT 12 campaign (Callison-Burch, Koehn, Monz, Post, Soricut, & Specia, 2012). In the second scenario, new material is selected (cf. Section 5.2) from Common Crawl (Smith et al., 2013) |
| Dataset Splits | Yes | We used SVMlight (Joachims, 1999) with linear, polynomial, and RBF kernels and we tuned the classifiers with 90% of the FFF+ corpus. The remaining 10% was left aside for testing purposes. Additionally, we used the WMT 08-11 test material for tuning the α and the TM s λs (dev), and WMT 12/13 tests for testing the methodology (test12 and test13). In our experiments we considered the FAUST dev Clean version for tuning (less error prone), and the real FAUST test Raw for testing. |
| Hardware Specification | Yes | These figures were computed on a Linux server with 96 GB of RAM and 24-core CPU Xeon processors 1.6 GHz (134064 Bogomips in total). |
| Software Dependencies | No | The paper mentions several software tools and algorithms, such as "SVMlight (Joachims, 1999)", "Moses training with EPPS (Koehn & Hoang, 2007)", and the "Freeling suite of NLP analyzers (Padr o, Collado, Reese, Lloberes, & Castell on, 2010)". However, specific version numbers for these tools or any other critical software libraries used for the implementation are not provided. |
| Experiment Setup | Yes | We trained support vector machines (SVM) with the previously described features to learn the classifiers. We used SVMlight (Joachims, 1999) with linear, polynomial, and RBF kernels and we tuned the classifiers with 90% of the FFF+ corpus. The remaining 10% was left aside for testing purposes. Feature values were clipped to fit into the range µ 3 σ2 to decrease the impact of outliers. Normalization was then applied by means of z-score: x = (x µ)/σ. Our training strategy aimed at optimizing F1 and consisted of two iterative steps: (a) parameter tuning: a grid search for the most appropriate SVM parameters (Hsu, Chang, & Lin, 2003), and (b) feature selection: a wrapper strategy, implementing backward elimination to discard redundant or irrelevant features (Witten & Frank, 2005, p. 294). We built the baseline SMT system following the standard pipeline of a Moses phrase-based system (Koehn & Hoang, 2007) from words into words and POS tags (Formiga et al., 2012). When combining the translation models, the BLEU improved from 27.86 to 28.75, achieving its highest value with α = 0.6 (i.e., a 60 40% distribution of the weight for the base and edited translation models, respectively). |