Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Deep Descriptive Clustering

Authors: Hongjing Zhang, Ian Davidson

IJCAI 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct experiments to evaluate our approach empirically. Based on our experiments, we aim to answer the following questions: Can our proposed approach generate better explanations compared to existing methods? (see Sec 4.2) Can it generate more complex explanations such as ontologies (see Sec 4.3)? How does our proposed approach perform in terms of clustering quality? (see Sec 4.4) How does simultaneously clustering and explaining improve our model s performance? (see Sec 4.5)
Researcher Affiliation	Academia	Hongjing Zhang , Ian Davidson University of California, Davis EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 presents our training algorithm for the deep descriptive clustering.
Open Source Code	No	The paper does not provide any explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	We evaluate the performance of our proposed model on two visual data sets with annotated semantic attributes. We ﬁrst use Attribute Pascal and Yahoo (a PY) [Farhadi et al., 2009], a small-scale coarse-grained dataset with 64 semantic attributes and 5274 instances. Further, we have studied Animals with Attributes (Aw A) [Lampert et al., 2013], which is a medium-scale dataset in terms of the number of images.
Dataset Splits	No	The paper mentions evaluating performance under different tag annotated ratios (r% {10, 30, 50}) and averaged over 10 trials, but does not specify exact train/validation/test splits, percentages, or absolute sample counts for each split. The term "validation" is not used in the context of data splits.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud computing instance types used for running the experiments. It only mentions using "pre-trained Res Net-101 features" which implies a certain computing capability but does not specify the hardware used for their own training.
Software Dependencies	No	The paper mentions the use of "Re LU" as the activation function and "Adam [Kingma and Ba, 2015]" as the optimizer with default parameters. However, it does not provide specific version numbers for these or any underlying machine learning frameworks (e.g., PyTorch, TensorFlow) or programming languages (e.g., Python).
Experiment Setup	Yes	For a fair comparison with all the baseline approaches, we use pre-trained Res Net-101 [He et al., 2016] features for all the clustering tasks and the encoder networks of deep descriptive clustering model are stacked by three fully connected layers with size of [1200, 1200, K] where K is the desired number of clusters. We set the expected number of tags for each cluster as 8 and hyper-parameters l, λ, γ as 1, 1, 100 respectively. The tag annotated ratio r is set as 0.5 by default to simulate a challenging setting. The activation function is Re LU, and the optimizer is Adam [Kingma and Ba, 2015] with default parameters.