CAWA: An Attention-Network for Credit Attribution

Authors: Saurav Manchanda, George Karypis8472-8479

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the credit attribution task on a variety of datasets show that the sentence class labels generated by CAWA outperform the competing approaches. Additionally, on the multilabel text classification task, CAWA performs better than the competing credit attribution approaches.
Researcher Affiliation Academia Saurav Manchanda, George Karypis University of Minnesota, Twin Cities, USA {manch043, karypis}@umn.edu
Pseudocode No The paper does not include a pseudocode block or algorithm.
Open Source Code Yes 1Our code and data are available at https://github.com/ gurdaspuriya/cawa.
Open Datasets Yes We performed experiments on five multilabel text datasets from different domains: Movies (Bamman, O Connor, and Smith 2014), Ohsumed (Hersh et al. 1994), TMC20072, Patents3, Delicious (Zubiaga et al. 2009).
Dataset Splits Yes For both the credit attribution and multilabel classification tasks, we used the same training and test dataset split as used in (Manchanda and Karypis 2018). For the credit attribution, the test dataset is synthetic, and each test document corresponds to multiple single-label documents concatenated together (thus, giving us ground truth segment labels for a document). Additionally, we also use a validation dataset, created in a similar manner to this test dataset, for the hyperparameter selection.
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU models, CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions 'scikit-multilearn' and 'ADAM' as software used, but does not provide specific version numbers for these or any other key software components.
Experiment Setup Yes For CAWA, DNN+A, and DNN-A, the number of nodes in the each of the hidden layer, all representations length, as well as the batch size for training the CAWA was set to 256. For regularization, we used a dropout (Srivastava et al. 2014) of 0.5 between all layers, except the output layer. For optimization, we used the ADAM (Kingma and Ba 2014) optimizer. We trained all the models for 100 epochs, with the learning-rate set to 0.001. ... For average pooling in CAWA, we fixed the kernel-size to three.