TC-DWA:Text Clustering with Dual Word-Level Augmentation
Authors: Bo Cheng, Ximing Li, Yi Chang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the effectiveness of TC-DWA, we conduct extensive experiments on several benchmark text datasets. The results demonstrate that TC-DWA consistently outperforms the state-of-the-art baseline methods. Code available: https://github.com/Bo Cheng-96/TC-DWA. |
| Researcher Affiliation | Academia | 1School of Artificial Intelligence, Jilin University, China 2College of Computer Science and Technology, Jilin University, China 3 Key Laboratory of Symbolic Computation and Knowledge Engineering of MOE, Jilin University, China 4International Center of Future Science, Jilin University, China 5Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, Ministry of Education, China |
| Pseudocode | Yes | For clarity, we summarize the training process of TC-DWA in Algorithm 1. |
| Open Source Code | Yes | Code available: https://github.com/Bo Cheng-96/TC-DWA. |
| Open Datasets | Yes | In the experiments, we select three commonly used text datasets, i.e., AG News, DBPedia, and Newsgroup. ... Newsgroup.4 For efficient evaluations, in terms of AG News and DBPedia, we randomly draw 10,000 texts from the full datasets containing massive samples; and in terms of Newsgroup, we employ its standard split of training set. ... 4http://qwone.com/~jason/20Newsgroups/ |
| Dataset Splits | No | The paper mentions using 'standard split of training set' for Newsgroup but does not provide explicit train/validation/test dataset splits (percentages or counts) or cross-validation details for all datasets needed to reproduce data partitioning. |
| Hardware Specification | Yes | All experiments are run on a Linux server with 2 NVIDIA TITAN GTX GPUs and 512G memory. |
| Software Dependencies | No | The paper mentions using BERT and Adam optimizer, but it does not specify version numbers for any programming languages, libraries, or specific software dependencies needed to replicate the experiment. |
| Experiment Setup | Yes | For all three datasets, limited by the storage of the experimental environment, we set the max sequence length to 128 tokens and the training batch size to 16. ... We use the Adam optimizer, and the initial learning rates are 5e-6 and 1e-3 for training the parameters of BERT and predictive parameters, respectively. The anchor word number k is set to 1 or 2. For the layer index h of computing NA weights, we fix it to 12 for all the three datasets. Besides, we set the scaling parameter γ defined in Eq.7 as γ = t/Max Iterγ0, where γ0 = 0.1 and t is the iteration number. Finally, the parameter β defined in Eq.10 is set to 0.9 and the number of epochs is set to 5 for all datasets. |