Weakly-supervised Text Classification with Wasserstein Barycenters Regularization
Authors: Jihong Ouyang, Yiming Wang, Ximing Li, Changchun Li
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that WTCWBR outperforms the existing weakly-supervised baselines, and achieves comparable performance to semi-supervised and supervised baselines. |
| Researcher Affiliation | Academia | 1College of Computer Science and Technology, Jilin University, China 2Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, China |
| Pseudocode | Yes | Algorithm 1 Model fitting for WTC-WBR |
| Open Source Code | No | The paper provides URLs for baseline methods but does not provide a specific link or explicit statement about releasing the source code for the proposed WTC-WBR methodology. |
| Open Datasets | Yes | We explore the proposed WTC-WBR method on 3 prevalent datasets from various domains: IMDB from movie review sentiment, AG News from news topic, and DBPedia from Wikipedia topic [Meng et al., 2020]. |
| Dataset Splits | Yes | Table 1: Summary of dataset statistics. Dataset #Train #Test #Class #Avg Lc IMDB 25,000 25,000 2 1 AG News 120,000 7,600 4 1 DBPedia 560,000 70,000 14 1.4 |
| Hardware Specification | Yes | We implemented our method by PyTorch and run it on 1 RTX A6000 GPU in a Ubuntu platform of 32G memories. |
| Software Dependencies | No | The paper mentions 'PyTorch' as the implementation framework but does not provide a specific version number. Other software components like 'BERT-base-uncased encoder' and 'Adam optimizer' are mentioned without version details. |
| Experiment Setup | Yes | The maximum sequence lengths are set as 512, 200 and 200 for IMDB, AG News and DBPedia, respectively. We apply the Adam optimizer and the learning rates are tuned over 1e-7 ~ 7e-4. We pre-train 2 epochs with the weakly self-training loss, and further train the full objective 10 epochs. For the static WTC-WBR and WTC-ST, we feed the static averaged pooling of token embeddings into feature encoder. The maximum sequence lengths are all set as 512. The batch size is 256. We pre-train 5 epochs and further train the full objective with 130, 20, and 20 epochs for IMDB, AG News, and DBPedia, respectively. The learning rates are tuned over 5e-5 ~ 1e-3. For both two versions, we firstly train the model on the texts which contain category words due to the lack of category-word-covered texts. We adopt a one-layer MLP as the feature encoder and a one-layer MLP as the classification layer. The dimensions are 768-200-K, and we apply tanh as the activation function. ρinit and ρfinal are set as 0.05 and 0.99. We have varied the regularization parameter η from the set {10-3, 10-2, ..., 102, 103} for examination and empirically set η as 100, 1 and 1 for IMDB, AG News and DBPedia, respectively. |