reproducibilityindex.ai

Integrating Image Clustering and Codebook Learning

Authors: Pengtao Xie, Eric P. Xing

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on two datasets demonstrate the effectiveness of two models. In this section, we evaluate the effectiveness of DLGMM and SC-DLGMM models by comparing them with four baseline methods on image clustering task. Experimental Settings The experiments are conducted on 15-Scenes (Lazebnik, Schmid, and Ponce 2007) dataset and Caltech-101 (Fei-Fei, Fergus, and Perona 2004) dataset.
Researcher Affiliation	Academia	Pengtao Xie and Eric Xing {pengtaox,epxing}@cs.cmu.edu School of Computer Science, Carnegie Mellon University 5000 Forbes Ave, Pittsburgh, PA 15213
Pseudocode	No	The paper describes processes using textual steps and mathematical equations, but does not provide a clearly labeled pseudocode or algorithm block.
Open Source Code	No	The paper does not provide any explicit statement about open-source code availability or links to a code repository.
Open Datasets	Yes	The experiments are conducted on 15-Scenes (Lazebnik, Schmid, and Ponce 2007) dataset and Caltech-101 (Fei-Fei, Fergus, and Perona 2004) dataset.
Dataset Splits	No	The paper mentions using subsets of datasets (e.g., 'randomly choose half images' for Caltech-101) and varying codebook size, but does not explicitly provide details on train/validation/test splits with percentages or counts, or reference predefined splits with citations.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions methods like SIFT, K-means, Normalized Cut, JSOM, and LDA, but does not specify any software names with version numbers or programming language versions required to replicate the experiment.
Experiment Setup	Yes	Our models are initialized with the clustering results obtained from LDA. We compare these methods under varying codebook size ranging from 100 to 1000 with an increment of 100. The required input cluster number in KM, NC and our models is set to the ground truth number of categories in datasets. In NC, we use Gaussian kernel as similarity measure between images. The bandwidth parameter is set to 1. In JSOM, topic number is set to 100. In LDA, symmetric Dirichlet priors are used and are set to 0.05. In SC-DLGMM, parameter γ on the MRF is tuned to produce the best possible clustering performance.