Hierarchical Attention Transfer Network for Cross-Domain Sentiment Classification

Authors: Zheng Li, Ying Wei, Yu Zhang, Qiang Yang

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the Amazon review dataset demonstrate the effectiveness of HATN. Table 2 reports the classification accuracies of different methods on the Amazon reviews dataset.
Researcher Affiliation Academia Zheng Li, Ying Wei, Yu Zhang, Qiang Yang Hong Kong University of Science and Technology, Hong Kong zlict@cse.ust.hk, yweiad@gmail.com, yu.zhang.ust@gmail.com, qyang@cse.ust.hk
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement regarding the release of source code or a link to a code repository for the described methodology.
Open Datasets Yes We conduct the experiments on the Amazon reviews dataset (Blitzer, Dredze, and Pereira 2007), which has been widely used for cross-domain sentiment classification.
Dataset Splits Yes For each pair A B, we randomly choose 2800 positive and 2800 negative reviews from the source domain A as the training data, the rest from the source domain A as the validation data, and all labeled reviews (6000) from the target domain B for testing. We perform early stopping on the validation set during the training process.
Hardware Specification No The paper does not specify the hardware used for running experiments (e.g., GPU/CPU models, memory).
Software Dependencies No The paper mentions 'NLTK' and 'word2vec vectors' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes The memory size nc and nw are set to 20 and 25 respectively. We use the public 300-dimensional word2vec vectors with the skip-gram model (Mikolov et al. 2013) to initialize the embedding matrix L. ... The hidden dimensions of the word attention layer and sentence attention layer are 300. ... The regularization weight ρ is set to 0.005. ... we use a batch size bs =50 for the sentiment classifier, a batch size bd =100 for the domain classifier... Gradients with the ℓ2 norm larger than 40 are normalized to be 40. ... T is set to 100. The learning rate is decayed as η= max( 0.005 (1+10p)0.75 , 0.002) and the adaptation rate is increased as λ= min( 2 1+exp( 10p) 1, 0.1) during training.