Representation Learning for Sequence Data with Deep Autoencoding Predictive Components
Authors: Junwen Bai, Weiran Wang, Yingbo Zhou, Caiming Xiong
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that our method recovers the latent space of noisy dynamical systems, extracts predictive features for forecasting tasks, and improves automatic speech recognition when used to pretrain the encoder on large amounts of unlabeled data.Experimental results show that DAPC can recover meaningful low dimensional dynamics from high dimensional noisy and nonlinear systems, extract predictive features for forecasting tasks, and obtain state-of-the-art accuracies for Automatic Speech Recognition (ASR) with a much lower cost, by pretraining encoders that are later finetuned with a limited amount of labeled data. |
| Researcher Affiliation | Collaboration | Junwen Bai Cornell University junwen@cs.cornell.edu Weiran Wang Google weiranwang@google.com Yingbo Zhou & Caiming Xiong Salesforce Research {yingbo.zhou, cxiong}@salesforce.com |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | 1Code is available at https://github.com/Junwen Bai/DAPC. |
| Open Datasets | Yes | 3 real-world datasets used by Clark et al. (2019), involving multi-city temperature time series data (Temp, Beniaguev (2017)), dorsal hippocampus study (HC, Glaser et al. (2020)), and motor cortex (M1, O Doherty et al. (2018)).Wall Street Journal (Paul & Baker, 1992) and Libri Speech (Panayotov et al., 2015). |
| Dataset Splits | Yes | We use 250, 25 and 25 segments for training, validation, and test splits respectively.For WSJ, we pretrain on si284 partition (81 hours), and finetune on si84 partition (15 hours) or the si284 partition itself. For Librispeech, we pretrain on the train 960 partition (960 hours) and finetune on the train clean 100 partition (100 hours). Standard dev and test splits for each corpus are used for validation and testing. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models or memory) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers were mentioned. For example, 'Adam optimizer' is mentioned, but without its version. |
| Experiment Setup | Yes | The optimal model uses T = 4, s = 0, α = 0, γ = 0.1, and β = 0.1 is set to balance the importance of PI and the reconstruction error.More specifically, CPC, MR, DAPC all use the bidirectional GRU where the learning rate is 0.001, and dropout rate is 0.7. Our GRU has 4 encoding layers with hidden size 256. The batch size is (20, 500, 30).We set the number of time masks to 4 for WSJ, and 8 for Libri Speech... We select hyperparameters which give the best dev set WER, and report the corresponding test set WER. In the end, we use T = 4 for estimating the PI term, γ = 0.05, β = 0.005 and set s = 2 for WSJ and s = 1 for Libri Speech if we use shifted reconstruction. |