Objective-Based Hierarchical Clustering of Deep Embedding Vectors

Authors: Stanislav Naumov, Grigory Yaroslavtsev, Dmitrii Avdiukhin9055-9063

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We initiate a comprehensive experimental study of objectivebased hierarchical clustering methods on massive datasets consisting of deep embedding vectors from computer vision and NLP applications. ... We report our key experimental results in Table 1 and Table 2.
Researcher Affiliation Academia Stanislav Naumov1, Grigory Yaroslavtsev2, Dmitrii Avdiukhin2 1ITMO University 2Indiana University
Pseudocode Yes Algorithm 1: GRADIENTDESCENTPARTITIONING ... Algorithm 2: BISECT++ AND CONQUER ... Algorithm 3: BALANCED MAX-2-SAT PARTITIONING ... Algorithm 4: Hierarchical Clustering via MAX-2SAT (B2SAT&C)
Open Source Code No The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We use image embeddings of Image Net ILSVRC 2012 (Deng et al. 2009) via Res Net34 (He et al. 2015). ... We use Res Net34 pre-trained on Image Net ILSVRC 2012 to compute embedding vectors of images from Image Net V2 (Recht et al. 2019). ... We use Res Net34 pretrained on Image Net ILSVRC 2012 to compute embedding vectors of Na Birds (Van Horn et al. 2015). ... we use unsupervised word embedding vectors trained on Twitter and Wikipedia (Yamada et al. 2020) using two classic methods Glove (Pennington, Socher, and Manning 2014) and Word2vec (Yamada et al. 2016; Mikolov et al. 2013). ... we use a pre-trained Sentence-BERT (Reimers and Gurevych 2019) to construct embeddings from the sentiment analysis dataset of movie reviews SST-2 (Socher et al. 2013).
Dataset Splits No The paper refers to 'train', 'validation', and 'test' in the context of deep learning (e.g., 'pre-trained neural net'), but it does not specify the train/validation/test dataset splits used for the hierarchical clustering experiments conducted in this paper.
Hardware Specification Yes Experiments were performed on 8 CPUs 2.0GHz Intel Xeon Scalable Processor (Skylake), 90Gb RAM.
Software Dependencies No The paper mentions software libraries like 'HDBSCAN library' but does not provide specific version numbers for any software components or dependencies required for reproduction.
Experiment Setup No The paper lists parameters for its algorithms (e.g., 'noise variance r, learning rates { ηt }, the number of iterations I', 'imbalance δ', 'maximum number of elements to run average linkage θ') but does not provide the specific numerical values for these hyperparameters or other training settings used in the experiments.