Unified Segment-to-Segment Framework for Simultaneous Sequence Generation
Authors: Shaolei Zhang, Yang Feng
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on multiple simultaneous generation tasks demonstrate that Seg2Seg achieves state-of-the-art performance and exhibits better generality across various tasks2. |
| Researcher Affiliation | Academia | 1Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences (ICT/CAS) 2University of Chinese Academy of Sciences |
| Pseudocode | Yes | Algorithm 1 illustrates the specific inference process of Seg2Seg. |
| Open Source Code | Yes | 2Code is available at: https://github.com/ictnlp/Seg2Seg. |
| Open Datasets | Yes | We apply Libri Speech3 benchmark [59], which consists of 960 hours English audio. |
| Dataset Splits | Yes | We use dev-clean (5.4 hours) and dev-other (5.3 hours) as validation sets, and test-clean (5.4 hours) and test-other (5.1 hours) as test sets, where test-other set contains more noisy audio. For speech, we use the raw 16-bit 16k Hz mono-channel audio wave. For text, we use Sentence Piece [60] to generate a unigram vocabulary of size 10000. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory) used for running the experiments. It only mentions using a Transformer-Base model and pre-trained Wav2Vec2.0. |
| Software Dependencies | No | The paper mentions software like Fairseq Library [66], Wav2Vec2.0 [67], and Simul Eval [68], but it does not specify exact version numbers for these software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | In Seg2Seg, we use the standard Transformer-Base (6 encoder and 6 decoder layers) [55] for Simul MT. For streaming ASR and Simul ST, we replace the word embedding layer in Transformer-Base with a pre-trained Wav2Vec2.06 [67] to extract the acoustic embedding, and the rest remains the same as Simul MT. |