Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM
Authors: Eliya Nachmani, Alon Levkovitch, Roy Hirsch, Julian Salazar, Chulayuth Asawaroengchai, Soroosh Mariooryad, Ehud Rivlin, RJ Skerry-Ryan, Michelle Tadmor Ramanovich
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS AND RESULTS To empirically evaluate the performance of the proposed approach, we conducted experiments on the Libri-Light dataset (Kahn et al., 2020). |
| Researcher Affiliation | Collaboration | Eliya Nachmani1, , Alon Levkovitch1,3, , , Roy Hirsch2, Julian Salazar1, Chulayuth Asawaroengchai1, Soroosh Mariooryad1, Ehud Rivlin2, RJ Skerry-Ryan1, Michelle Tadmor Ramanovich1 1Google Research, 2Verily AI, 3Tel-Aviv Univeristy {eliyn, alevkovitch, royhirsch}@google.com |
| Pseudocode | No | The paper describes the architecture and training objective in text and diagrams (Figure 1), but does not include any explicit pseudocode blocks or algorithms. |
| Open Source Code | No | The paper states: 'We release our audio samples and spoken QA dataset via our website.' and 'Audio samples and our spoken QA dataset can be found on the project website.' but does not explicitly state that the source code for the Spectron model or its methodology is released. |
| Open Datasets | Yes | To empirically evaluate the performance of the proposed approach, we conducted experiments on the Libri-Light dataset (Kahn et al., 2020). |
| Dataset Splits | No | The paper states using Libri-Light for training and test-clean from Libri Speech for testing, but does not explicitly provide training/validation/test dataset splits or mention a specific validation set. |
| Hardware Specification | Yes | Our proposed model was trained using 64 TPUv4 chips (Jouppi et al., 2023), over a duration of 48 hours. |
| Software Dependencies | No | The paper mentions software like 'Wave Fit vocoder' and 'open-source transformers library' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | Table 6: Model hyper-parameters used in the experiments. ( n : n layers) Input & Output, Spec Augment, Speech Encoder, Language Model, Wave Fit vocoder, Training parameters (Optimizer, Learning rate, Warm-up steps, Batch size, Continuation loss weight, Derivative loss order) are all detailed. |