reproducibilityindex.ai

Generative Semi-supervised Learning for Multivariate Time Series Imputation

Authors: Xiaoye Miao, Yangyang Wu, Jun Wang, Yunjun Gao, Xudong Mao, Jianwei Yin8983-8991

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on three public real-world datasets demonstrate that, SSGAN yields a more than 15% gain in performance, compared with the state-of-the-art methods.
Researcher Affiliation	Academia	1 Center for Data Science, Zhejiang University, Hangzhou, China 2 College of Computer Science, Zhejiang University, Hangzhou, China 3 Information Hub, The Hong Kong University of Science and Technology, Hong Kong, China 4 Department of Artiﬁcial Intelligence, Xiamen University, Xiamen, China
Pseudocode	No	The paper describes the architecture and steps of SSGAN but does not present them in a structured pseudocode or algorithm block format.
Open Source Code	No	The paper does not provide an explicit statement or a link to the open-source code for the SSGAN methodology.
Open Datasets	Yes	The localization for human activity dataset1 (Activity) (Cao et al. 2018; Kaluˇza et al. 2010; Silva et al. 2012) consists of the multivariate kinematic time series recording the motion state of 5 people performing 11 kinds of activities. 1https://archive.ics.uci.edu/ml/datasets/Localization+Data+for+Person+Activity. The Physio Net Challenge 2012 dataset2 (Physio Net) (Cao et al. 2018; Che et al. 2018; Luo et al. 2018, 2019; Silva et al. 2012) contains 4,000 multivariate clinical time series with 41 measurements from intensive care unit (ICU) stays, where 554 patients died in hospital. 2https://physionet.org/content/challenge-2012/1.0.0/. The KDD CUP 2018 dataset3 (KDD) (Cao et al. 2018; Luo et al. 2018, 2019; Silva et al. 2012), a public meteorologic dataset with 13.30% missing rate, includes PM2.5 measurements from 36 monitoring stations in Beijing, which are hourly collected between 2014/05/01 to 2015/04/30. 3https://github.com/NIPS-BRITS/BRITS
Dataset Splits	No	The paper mentions 'We randomly choose 80% data as the training data, and the rest as the test data.' but does not specify a separate validation split or how it was handled.
Hardware Specification	Yes	The experiments were conducted in an Intel Core 2.80GHz server with TITAN Xp 12Gi B (GPU) and 192GB RAM.
Software Dependencies	No	The paper states 'All methods were implemented in Python' but does not provide specific version numbers for Python or any libraries/frameworks used (e.g., TensorFlow, PyTorch).
Experiment Setup	Yes	In implementation, for all deep learning baselines, the learning rate is 0.001, the dropout rate is 0.5, the number of hidden units in recurrent network is 64, and the training epoch is 30. The batch size is 128 for Physio Net and 64 for KDD and Activity. The dimensionality of the latent space in GP-VAE is 35. The dimensionality of random noise in TSGAN and the latent vector in E2GAN are both 64. In SSGAN, the ADAM algorithm is utilized to train networks, the classiﬁer is pre-trained for ﬁve epochs before training GAN model, the training epoch is 30, the generator is trained once for every ﬁve optimization steps of the discriminator. The hyper-parameters α and β of the generator are both 5. The reminder rate (i.e., how many missing states in M are encoded in the temporal reminder matrix for the discriminator) is 0.8. The label rate is 100%.