Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM

Authors: Eliya Nachmani, Alon Levkovitch, Roy Hirsch, Julian Salazar, Chulayuth Asawaroengchai, Soroosh Mariooryad, Ehud Rivlin, RJ Skerry-Ryan, Michelle Tadmor Ramanovich

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 EXPERIMENTS AND RESULTS To empirically evaluate the performance of the proposed approach, we conducted experiments on the Libri-Light dataset (Kahn et al., 2020).
Researcher Affiliation Collaboration Eliya Nachmani1, , Alon Levkovitch1,3, , , Roy Hirsch2, Julian Salazar1, Chulayuth Asawaroengchai1, Soroosh Mariooryad1, Ehud Rivlin2, RJ Skerry-Ryan1, Michelle Tadmor Ramanovich1 1Google Research, 2Verily AI, 3Tel-Aviv Univeristy {eliyn, alevkovitch, royhirsch}@google.com
Pseudocode No The paper describes the architecture and training objective in text and diagrams (Figure 1), but does not include any explicit pseudocode blocks or algorithms.
Open Source Code No The paper states: 'We release our audio samples and spoken QA dataset via our website.' and 'Audio samples and our spoken QA dataset can be found on the project website.' but does not explicitly state that the source code for the Spectron model or its methodology is released.
Open Datasets Yes To empirically evaluate the performance of the proposed approach, we conducted experiments on the Libri-Light dataset (Kahn et al., 2020).
Dataset Splits No The paper states using Libri-Light for training and test-clean from Libri Speech for testing, but does not explicitly provide training/validation/test dataset splits or mention a specific validation set.
Hardware Specification Yes Our proposed model was trained using 64 TPUv4 chips (Jouppi et al., 2023), over a duration of 48 hours.
Software Dependencies No The paper mentions software like 'Wave Fit vocoder' and 'open-source transformers library' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes Table 6: Model hyper-parameters used in the experiments. ( n : n layers) Input & Output, Spec Augment, Speech Encoder, Language Model, Wave Fit vocoder, Training parameters (Optimizer, Learning rate, Warm-up steps, Batch size, Continuation loss weight, Derivative loss order) are all detailed.