reproducibilityindex.ai

Unsupervised Transfer Learning for Spatiotemporal Predictive Networks

Authors: Zhiyu Yao, Yunbo Wang, Mingsheng Long, Jianmin Wang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Compared with finetuning, our approach yields significant improvements on three benchmarks for spatiotemporal prediction, and benefits the target task even from less relevant pretext ones. We study unsupervised transfer learning performed between different spatiotemporal prediction tasks, within or across the following three benchmarks: Flying digits, Human motion, Precipitation nowcasting.
Researcher Affiliation	Academia	1School of Software, BNRist, Research Center for Big Data, Tsinghua University. Correspondence to: Mingsheng Long <mingsheng@tsinghua.edu.cn>.
Pseudocode	No	The paper describes the methodology using equations and text but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code and datasets are made available at https: //github.com/thuml/transferable-memory.
Open Datasets	Yes	Code and datasets are made available at https: //github.com/thuml/transferable-memory. For example, the paper uses the Human3.6M (Ionescu et al., 2013), KTH (Schuldt et al., 2004), and Weizmann (Blank et al., 2005) datasets, which are standard benchmarks in the field, and also uses Moving MNIST.
Dataset Splits	Yes	Each dataset contains 10,000 training sequences, 2,000 validation sequences, and 3,000 testing sequences. (Flying digits example) and the Human3.6M dataset as the target domain, which has 2,220 sequences for training, 300 for validation, and 1,056 for testing.
Hardware Specification	Yes	All experiments are implemented in PyTorch (Paszke et al., 2019) and conducted on NVIDIA TITAN-RTX GPUs.
Software Dependencies	Yes	All experiments are implemented in PyTorch (Paszke et al., 2019) and conducted on NVIDIA TITAN-RTX GPUs.
Experiment Setup	Yes	We use the ADAM optimizer (Kingma & Ba, 2015) with a starting learning rate of 0.001 for training the TMU network. Unless otherwise mentioned, the batch size is set to 8, and the training process is stopped after 80,000 iterations. [...] We show the sensitivity analysis of the training hyper-parameter β in Figure 4. [...] we set β to 0.1 throughout this paper.