Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering

Authors: Zhuxi Jiang, Yin Zheng, Huachun Tan, Bangsheng Tang, Hanning Zhou

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Quantitative comparisons with strong baselines are included in this paper, and experimental results show that Va DE significantly outperforms the state-of-the-art clustering methods on 5 benchmarks from various modalities.
Researcher Affiliation Collaboration 1Beijing Institute of Technology, Beijing, China 2Tencent AI Lab, Shenzhen, China 3Hulu LLC., Beijing, China
Pseudocode No The paper describes the generative process and optimization steps in narrative text and mathematical equations, but it does not include formal pseudocode blocks or algorithms.
Open Source Code Yes The code of Va DE is available at https://github.com/slim1017/Va DE.
Open Datasets Yes MNIST [Le Cun et al., 1998], HHAR [Stisen et al., 2015], REUTERS-10K [Lewis et al., 2004], REUTERS [Lewis et al., 2004] and STL-10 [Coates et al., 2011].
Dataset Splits No The paper discusses training parameters like learning rates and pretraining but does not explicitly specify dataset splits for training, validation, and testing. It mentions a test set later but no clear validation split details.
Hardware Specification No The paper describes the network architectures and training parameters but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for the experiments.
Software Dependencies No The paper mentions the use of the Adam optimizer and techniques like t-SNE and ResNet-50, but it does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or library versions).
Experiment Setup Yes Specifically, the architectures of f and g in Equation 1 and Equation 10 are 10-2000-500-500-D and D-500-500-2000-10, respectively, where D is the input dimensionality. All layers are fully connected. Adam optimizer [Kingma and Ba, 2015] is used to maximize the ELBO of Equation 9, and the mini-batch size is 100. The learning rate for MNIST, HHAR, Reuters-10K and STL-10 is 0.002 and decreases every 10 epochs with a decay rate of 0.9, and the learning rate for Reuters is 0.0005 with a decay rate of 0.5 for every epoch.