Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Improved Deep Embedded Clustering with Local Structure Preservation
Authors: Xifeng Guo, Long Gao, Xinwang Liu, Jianping Yin
IJCAI 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on image and text datasets empirically validate the importance of local structure preservation and the effectiveness of our algorithm. |
| Researcher Affiliation | Academia | College of Computer, National University of Defense Technology, Changsha, China |
| Pseudocode | Yes | Algorithm 1: Improved Deep Embedded Clustering |
| Open Source Code | Yes | Our implementation is based on Python and Keras [Chollet, 2015] and is available at https://github.com/Xifeng Guo/IDEC. |
| Open Datasets | Yes | MNIST: The MNIST dataset [Le Cun et al., 1998] consists of total 70000 handwritten digits of 28x28 pixel size. We reshaped each gray image to a 784 dimensional vector. USPS: The USPS dataset contains 9298 gray-scale handwritten digit images with size of 16x16 pixels. REUTERS-10K: Reuters contains around 810000 English news stories labeled with a category tree [Lewis et al., 2004]. |
| Dataset Splits | No | The paper uses well-known datasets but does not explicitly state the training, validation, and test dataset splits with percentages or counts for its own experiments. It refers to a prior work ([Xie et al., 2016]) for pretraining details, which might contain such information. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running its experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions that 'Our implementation is based on Python and Keras [Chollet, 2015]' but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | Following the settings in DEC [Xie et al., 2016], the encoder network is set as a fully connected multilayer perceptron (MLP) with dimensions d 500 500 2000 10 for all datasets... After pretraining, the coefficient γ of clustering loss is set to 0.1... and batch size to 256 for all datasets. The optimizer Adam [Kingma and Ba, 2014] with init learning rate λ = 0.001, β1 = 0.9, β2 = 0.999 is applied for MNIST dataset and SGD with learning rate λ = 0.1 and momentum β = 0.99 is used for USPS and REUTERS-10K datasets. The convergence threshold is set to δ = 0.1%. And the update intervals T are 140, 30, 3 iterations for MNIST, USPS and REUTERS-10K respectively. |