Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Image Clustering Conditioned on Text Criteria

Authors: Sehyun Kwon, Jaeseung Park, Minkyu Kim, Jaewoong Cho, Ernest K. Ryu, Kangwook Lee

ICLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 EXPERIMENTS We now present experimental results demonstrating the effectiveness of IC\|TC.
Researcher Affiliation	Collaboration	Sehyun Kwon 1, Jaeseung Park 1, Minkyu Kim , Jaewoong Cho , Ernest K. Ryu , Kangwook Lee Seoul National University, KRAFTON, University of Wisconsin Madison, Co-senior authors
Pseudocode	Yes	IC\|TC: IMAGE CLUSTERING CONDITIONED ON TEXT CRITERIA Our main method consists of 3 stages with an optional iterative outer loop. ... Step 1 Vision-language model (VLM) extracts salient features ... Step 2 Large Language Model (LLM) obtains K cluster names ... Step 3 Large Language Model (LLM) assigns clusters to images ... Main method IC\|TC
Open Source Code	Yes	2 Our code is available at https://github.com/sehyunkwon/ICTC.
Open Datasets	Yes	We use the Stanford 40 Action Dataset (Yao et al., 2011)... We use the People Playing Musical Instrument (PPMI) dataset (Wang et al., 2010; Yao and Fei-Fei, 2010)... We compare IC\|TC against several classical clustering algorithms on CIFAR-10, STL-10, and CIFAR-100.
Dataset Splits	No	The paper states using various standard datasets (e.g., CIFAR-10, STL-10, CIFAR-100), but it does not explicitly provide specific percentages, sample counts, or clear references to the exact train/validation/test splits used for these datasets within the paper.
Hardware Specification	No	The paper mentions using LLa VA and GPT-4, accessing GPT-4 through its API, but does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running their experiments or training their models.
Software Dependencies	Yes	In our experiments, we mainly use LLa VA (Liu et al., 2023) for the VLM and GPT-4 (Open AI, 2023) for the LLM...Table 11: Model versions for the VLMs and LLMs (e.g., blip2-flan-t5-xxl, llava-v1-0719-336px-lora-merge-vicuna-13b-v1.3, api-version=2023-03-15-preview)
Experiment Setup	Yes	In particular, the precise text prompts used can be found in Appendix B.3.1. ... Careful prompt engineering of Pstep2b(TC, N, K) allows the user to refine the clusters to be consistent with the user s criteria. ... we concluded that using threshold values such as 5 or 10 was helpful in getting a better set of clustered classes.