Unsupervised Transfer Learning for Spatiotemporal Predictive Networks

Authors: Zhiyu Yao, Yunbo Wang, Mingsheng Long, Jianmin Wang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Compared with finetuning, our approach yields significant improvements on three benchmarks for spatiotemporal prediction, and benefits the target task even from less relevant pretext ones. We study unsupervised transfer learning performed between different spatiotemporal prediction tasks, within or across the following three benchmarks: Flying digits, Human motion, Precipitation nowcasting.
Researcher Affiliation Academia 1School of Software, BNRist, Research Center for Big Data, Tsinghua University. Correspondence to: Mingsheng Long <mingsheng@tsinghua.edu.cn>.
Pseudocode No The paper describes the methodology using equations and text but does not provide structured pseudocode or algorithm blocks.
Open Source Code Yes Code and datasets are made available at https: //github.com/thuml/transferable-memory.
Open Datasets Yes Code and datasets are made available at https: //github.com/thuml/transferable-memory. For example, the paper uses the Human3.6M (Ionescu et al., 2013), KTH (Schuldt et al., 2004), and Weizmann (Blank et al., 2005) datasets, which are standard benchmarks in the field, and also uses Moving MNIST.
Dataset Splits Yes Each dataset contains 10,000 training sequences, 2,000 validation sequences, and 3,000 testing sequences. (Flying digits example) and the Human3.6M dataset as the target domain, which has 2,220 sequences for training, 300 for validation, and 1,056 for testing.
Hardware Specification Yes All experiments are implemented in PyTorch (Paszke et al., 2019) and conducted on NVIDIA TITAN-RTX GPUs.
Software Dependencies Yes All experiments are implemented in PyTorch (Paszke et al., 2019) and conducted on NVIDIA TITAN-RTX GPUs.
Experiment Setup Yes We use the ADAM optimizer (Kingma & Ba, 2015) with a starting learning rate of 0.001 for training the TMU network. Unless otherwise mentioned, the batch size is set to 8, and the training process is stopped after 80,000 iterations. [...] We show the sensitivity analysis of the training hyper-parameter β in Figure 4. [...] we set β to 0.1 throughout this paper.