Unsupervised Transfer Learning for Spatiotemporal Predictive Networks
Authors: Zhiyu Yao, Yunbo Wang, Mingsheng Long, Jianmin Wang
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Compared with finetuning, our approach yields significant improvements on three benchmarks for spatiotemporal prediction, and benefits the target task even from less relevant pretext ones. We study unsupervised transfer learning performed between different spatiotemporal prediction tasks, within or across the following three benchmarks: Flying digits, Human motion, Precipitation nowcasting. |
| Researcher Affiliation | Academia | 1School of Software, BNRist, Research Center for Big Data, Tsinghua University. Correspondence to: Mingsheng Long <mingsheng@tsinghua.edu.cn>. |
| Pseudocode | No | The paper describes the methodology using equations and text but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and datasets are made available at https: //github.com/thuml/transferable-memory. |
| Open Datasets | Yes | Code and datasets are made available at https: //github.com/thuml/transferable-memory. For example, the paper uses the Human3.6M (Ionescu et al., 2013), KTH (Schuldt et al., 2004), and Weizmann (Blank et al., 2005) datasets, which are standard benchmarks in the field, and also uses Moving MNIST. |
| Dataset Splits | Yes | Each dataset contains 10,000 training sequences, 2,000 validation sequences, and 3,000 testing sequences. (Flying digits example) and the Human3.6M dataset as the target domain, which has 2,220 sequences for training, 300 for validation, and 1,056 for testing. |
| Hardware Specification | Yes | All experiments are implemented in PyTorch (Paszke et al., 2019) and conducted on NVIDIA TITAN-RTX GPUs. |
| Software Dependencies | Yes | All experiments are implemented in PyTorch (Paszke et al., 2019) and conducted on NVIDIA TITAN-RTX GPUs. |
| Experiment Setup | Yes | We use the ADAM optimizer (Kingma & Ba, 2015) with a starting learning rate of 0.001 for training the TMU network. Unless otherwise mentioned, the batch size is set to 8, and the training process is stopped after 80,000 iterations. [...] We show the sensitivity analysis of the training hyper-parameter β in Figure 4. [...] we set β to 0.1 throughout this paper. |