Improving Domain-independent Cloud-Based Speech Recognition with Domain-Dependent Phonetic Post-Processing

Authors: Johannes Twiefel, Timo Baumann, Stefan Heinrich, Stefan Wermter

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present results for a variety of corpora (mainly from human-robot interaction) where our combined approach significantly outperforms Google ASR as well as a plain open-source ASR solution. We present an experiment in which we use our post-processing technique in Section 5 and discuss the results in Section 6.
Researcher Affiliation Academia Johannes Twiefel, Timo Baumann, Stefan Heinrich, and Stefan Wermter University of Hamburg, Department of Informatics Vogt-K olln-Straße 30, D 22527 Hamburg, Germany
Pseudocode No The paper includes system diagrams (Figure 1 and Figure 2) but does not contain any pseudocode or algorithm blocks.
Open Source Code Yes To foster such research, our implemented framework DOCKS (DOmainand Cloud-based Knowledge for Speech recognition) is available as open-source software at www.informatik.uni-hamburg.de/WTM/software/.
Open Datasets No The paper uses the TIMIT corpus (Garofolo et al. 1993) which is publicly available. However, it also uses a 'Scripted HRI data set' which was 'previously recorded by Heinrich and Wermter (2011)' and a 'Spontaneous HRI data set' which they 'collected', neither of which is stated to be publicly available with access information.
Dataset Splits No The paper does not explicitly provide training, validation, or test dataset splits with percentages, sample counts, or references to predefined splits for all datasets used. While it mentions the 'Core Test Set' for TIMIT, it doesn't specify other splits for TIMIT or any splits for the SCRIPTED or SPONT corpora.
Hardware Specification No The paper does not explicitly describe the hardware (e.g., specific CPU/GPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions software like 'Sphinx-4' and 'Sequitur G2P' but does not provide specific version numbers for these or any other software dependencies used in their experiments.
Experiment Setup Yes We therefore use a cost of 0.1 for matches and 0.9 for all other edit operations in the implementation described in Subsection 4.3 below. We also experiment with variable costs for phoneme substitution as detailed next.