Generative Semi-supervised Learning for Multivariate Time Series Imputation

Authors: Xiaoye Miao, Yangyang Wu, Jun Wang, Yunjun Gao, Xudong Mao, Jianwei Yin8983-8991

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on three public real-world datasets demonstrate that, SSGAN yields a more than 15% gain in performance, compared with the state-of-the-art methods.
Researcher Affiliation Academia 1 Center for Data Science, Zhejiang University, Hangzhou, China 2 College of Computer Science, Zhejiang University, Hangzhou, China 3 Information Hub, The Hong Kong University of Science and Technology, Hong Kong, China 4 Department of Artificial Intelligence, Xiamen University, Xiamen, China
Pseudocode No The paper describes the architecture and steps of SSGAN but does not present them in a structured pseudocode or algorithm block format.
Open Source Code No The paper does not provide an explicit statement or a link to the open-source code for the SSGAN methodology.
Open Datasets Yes The localization for human activity dataset1 (Activity) (Cao et al. 2018; Kaluˇza et al. 2010; Silva et al. 2012) consists of the multivariate kinematic time series recording the motion state of 5 people performing 11 kinds of activities. 1https://archive.ics.uci.edu/ml/datasets/Localization+Data+for+Person+Activity. The Physio Net Challenge 2012 dataset2 (Physio Net) (Cao et al. 2018; Che et al. 2018; Luo et al. 2018, 2019; Silva et al. 2012) contains 4,000 multivariate clinical time series with 41 measurements from intensive care unit (ICU) stays, where 554 patients died in hospital. 2https://physionet.org/content/challenge-2012/1.0.0/. The KDD CUP 2018 dataset3 (KDD) (Cao et al. 2018; Luo et al. 2018, 2019; Silva et al. 2012), a public meteorologic dataset with 13.30% missing rate, includes PM2.5 measurements from 36 monitoring stations in Beijing, which are hourly collected between 2014/05/01 to 2015/04/30. 3https://github.com/NIPS-BRITS/BRITS
Dataset Splits No The paper mentions 'We randomly choose 80% data as the training data, and the rest as the test data.' but does not specify a separate validation split or how it was handled.
Hardware Specification Yes The experiments were conducted in an Intel Core 2.80GHz server with TITAN Xp 12Gi B (GPU) and 192GB RAM.
Software Dependencies No The paper states 'All methods were implemented in Python' but does not provide specific version numbers for Python or any libraries/frameworks used (e.g., TensorFlow, PyTorch).
Experiment Setup Yes In implementation, for all deep learning baselines, the learning rate is 0.001, the dropout rate is 0.5, the number of hidden units in recurrent network is 64, and the training epoch is 30. The batch size is 128 for Physio Net and 64 for KDD and Activity. The dimensionality of the latent space in GP-VAE is 35. The dimensionality of random noise in TSGAN and the latent vector in E2GAN are both 64. In SSGAN, the ADAM algorithm is utilized to train networks, the classifier is pre-trained for five epochs before training GAN model, the training epoch is 30, the generator is trained once for every five optimization steps of the discriminator. The hyper-parameters α and β of the generator are both 5. The reminder rate (i.e., how many missing states in M are encoded in the temporal reminder matrix for the discriminator) is 0.8. The label rate is 100%.