Objective-Based Hierarchical Clustering of Deep Embedding Vectors
Authors: Stanislav Naumov, Grigory Yaroslavtsev, Dmitrii Avdiukhin9055-9063
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We initiate a comprehensive experimental study of objectivebased hierarchical clustering methods on massive datasets consisting of deep embedding vectors from computer vision and NLP applications. ... We report our key experimental results in Table 1 and Table 2. |
| Researcher Affiliation | Academia | Stanislav Naumov1, Grigory Yaroslavtsev2, Dmitrii Avdiukhin2 1ITMO University 2Indiana University |
| Pseudocode | Yes | Algorithm 1: GRADIENTDESCENTPARTITIONING ... Algorithm 2: BISECT++ AND CONQUER ... Algorithm 3: BALANCED MAX-2-SAT PARTITIONING ... Algorithm 4: Hierarchical Clustering via MAX-2SAT (B2SAT&C) |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We use image embeddings of Image Net ILSVRC 2012 (Deng et al. 2009) via Res Net34 (He et al. 2015). ... We use Res Net34 pre-trained on Image Net ILSVRC 2012 to compute embedding vectors of images from Image Net V2 (Recht et al. 2019). ... We use Res Net34 pretrained on Image Net ILSVRC 2012 to compute embedding vectors of Na Birds (Van Horn et al. 2015). ... we use unsupervised word embedding vectors trained on Twitter and Wikipedia (Yamada et al. 2020) using two classic methods Glove (Pennington, Socher, and Manning 2014) and Word2vec (Yamada et al. 2016; Mikolov et al. 2013). ... we use a pre-trained Sentence-BERT (Reimers and Gurevych 2019) to construct embeddings from the sentiment analysis dataset of movie reviews SST-2 (Socher et al. 2013). |
| Dataset Splits | No | The paper refers to 'train', 'validation', and 'test' in the context of deep learning (e.g., 'pre-trained neural net'), but it does not specify the train/validation/test dataset splits used for the hierarchical clustering experiments conducted in this paper. |
| Hardware Specification | Yes | Experiments were performed on 8 CPUs 2.0GHz Intel Xeon Scalable Processor (Skylake), 90Gb RAM. |
| Software Dependencies | No | The paper mentions software libraries like 'HDBSCAN library' but does not provide specific version numbers for any software components or dependencies required for reproduction. |
| Experiment Setup | No | The paper lists parameters for its algorithms (e.g., 'noise variance r, learning rates { ηt }, the number of iterations I', 'imbalance δ', 'maximum number of elements to run average linkage θ') but does not provide the specific numerical values for these hyperparameters or other training settings used in the experiments. |