Improved Deep Embedded Clustering with Local Structure Preservation
Authors: Xifeng Guo, Long Gao, Xinwang Liu, Jianping Yin
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on image and text datasets empirically validate the importance of local structure preservation and the effectiveness of our algorithm. |
| Researcher Affiliation | Academia | College of Computer, National University of Defense Technology, Changsha, China |
| Pseudocode | Yes | Algorithm 1: Improved Deep Embedded Clustering |
| Open Source Code | Yes | Our implementation is based on Python and Keras [Chollet, 2015] and is available at https://github.com/Xifeng Guo/IDEC. |
| Open Datasets | Yes | MNIST: The MNIST dataset [Le Cun et al., 1998] consists of total 70000 handwritten digits of 28x28 pixel size. We reshaped each gray image to a 784 dimensional vector. USPS: The USPS dataset contains 9298 gray-scale handwritten digit images with size of 16x16 pixels. REUTERS-10K: Reuters contains around 810000 English news stories labeled with a category tree [Lewis et al., 2004]. |
| Dataset Splits | No | The paper uses well-known datasets but does not explicitly state the training, validation, and test dataset splits with percentages or counts for its own experiments. It refers to a prior work ([Xie et al., 2016]) for pretraining details, which might contain such information. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running its experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions that 'Our implementation is based on Python and Keras [Chollet, 2015]' but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | Following the settings in DEC [Xie et al., 2016], the encoder network is set as a fully connected multilayer perceptron (MLP) with dimensions d 500 500 2000 10 for all datasets... After pretraining, the coefficient γ of clustering loss is set to 0.1... and batch size to 256 for all datasets. The optimizer Adam [Kingma and Ba, 2014] with init learning rate λ = 0.001, β1 = 0.9, β2 = 0.999 is applied for MNIST dataset and SGD with learning rate λ = 0.1 and momentum β = 0.99 is used for USPS and REUTERS-10K datasets. The convergence threshold is set to δ = 0.1%. And the update intervals T are 140, 30, 3 iterations for MNIST, USPS and REUTERS-10K respectively. |