Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Discovering New Intents via Constrained Deep Adaptive Clustering with Cluster Refinement
Authors: Ting-En Lin, Hua Xu, Hanlei Zhang8360-8367
AAAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on the three benchmark datasets show that our method can yield significant improvements over strong baselines. |
| Researcher Affiliation | Academia | 1State Key Laboratory of Intelligent Technology and Systems, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China 2 Beijing National Research Center for Information Science and Technology(BNRist), Beijing 100084, China 3 School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/thuiar/CDAC-plus |
| Open Datasets | Yes | We conduct experiments on three publicly available short text datasets. The detailed statistics are shown in Table 1. |
| Dataset Splits | Yes | Besides, we divide all dataset into training, validation, and test sets. First, we train the model by limited labeled data (containing known intents) and unlabeled data (containing all intents) in the training set. Second, we tune the model on the validation set, which only contains known intents. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'implemented in Py Torch' and 'pre-trained BERT model', but does not specify version numbers for these software components. |
| Experiment Setup | Yes | The training batch size is 256, and the learning rate is 5e-5. We use the same dynamic thresholds as DAC (Chang et al. 2017) and set u(λ) = 0.95 λ, l(λ) = 0.455 + 0.1 λ, and η = 0.009. During the refinement stage, we perform K-means on intent representation I to obtain the initial cluster centroids U and set the stop criteria δlabel as 0.1%. |