Improving Domain-independent Cloud-Based Speech Recognition with Domain-Dependent Phonetic Post-Processing
Authors: Johannes Twiefel, Timo Baumann, Stefan Heinrich, Stefan Wermter
AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present results for a variety of corpora (mainly from human-robot interaction) where our combined approach significantly outperforms Google ASR as well as a plain open-source ASR solution. We present an experiment in which we use our post-processing technique in Section 5 and discuss the results in Section 6. |
| Researcher Affiliation | Academia | Johannes Twiefel, Timo Baumann, Stefan Heinrich, and Stefan Wermter University of Hamburg, Department of Informatics Vogt-K olln-Straße 30, D 22527 Hamburg, Germany |
| Pseudocode | No | The paper includes system diagrams (Figure 1 and Figure 2) but does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | To foster such research, our implemented framework DOCKS (DOmainand Cloud-based Knowledge for Speech recognition) is available as open-source software at www.informatik.uni-hamburg.de/WTM/software/. |
| Open Datasets | No | The paper uses the TIMIT corpus (Garofolo et al. 1993) which is publicly available. However, it also uses a 'Scripted HRI data set' which was 'previously recorded by Heinrich and Wermter (2011)' and a 'Spontaneous HRI data set' which they 'collected', neither of which is stated to be publicly available with access information. |
| Dataset Splits | No | The paper does not explicitly provide training, validation, or test dataset splits with percentages, sample counts, or references to predefined splits for all datasets used. While it mentions the 'Core Test Set' for TIMIT, it doesn't specify other splits for TIMIT or any splits for the SCRIPTED or SPONT corpora. |
| Hardware Specification | No | The paper does not explicitly describe the hardware (e.g., specific CPU/GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions software like 'Sphinx-4' and 'Sequitur G2P' but does not provide specific version numbers for these or any other software dependencies used in their experiments. |
| Experiment Setup | Yes | We therefore use a cost of 0.1 for matches and 0.9 for all other edit operations in the implementation described in Subsection 4.3 below. We also experiment with variable costs for phoneme substitution as detailed next. |