Multi-Label Patent Categorization with Non-Local Attention-Based Graph Convolutional Network

Authors: Pingjie Tang, Meng Jiang, Bryan (Ning) Xia, Jed W. Pitera, Jeffrey Welser, Nitesh V. Chawla9024-9031

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the performance of our model and as many as seven competitive baselines. We find that our model outperforms all those prior state of the art by a large margin and achieves high performance on P@k and n DCG@k.
Researcher Affiliation Collaboration 1University of Notre Dame, Notre Dame, Indiana 46556, USA {ptang, mjiang2, nxia, nchawla}@nd.edu, 2IBM Research Almaden, 650 Harry Road, San Jose, California 95120, USA {pitera, welser}@us.ibm.com
Pseudocode No The paper contains figures illustrating the model architecture and data flow, but no formal pseudocode blocks or algorithms.
Open Source Code No The paper does not provide a link to open-source code or explicitly state that the code for the described methodology is available.
Open Datasets No We query 45,526 patents with 555 IPC codes from an internal patent database named CIRCA. We name the dataset as Patent CIRCA 45k.
Dataset Splits Yes We split the data into training and testing in a 80/20 ratio, in actual experiments, we hold 10% training data as the validation set (unlisted in Table 1) that is used to choose the optimal parameters.
Hardware Specification Yes We use two Nvidia Titan Xp GPUs
Software Dependencies No The paper mentions using 'Py Torch framework', 'Gensim Word2vec library', and 'NLTK' but does not specify version numbers for these software components.
Experiment Setup Yes We use 256 hidden units in Bi LSTM for GCN aggregators. We choose maximal search depth k as 2... We use the same two FC layers (512 hidden units and 256 hidden units, respectively)... We use Re LU and Dropout (0.5 rate) between these FC layers. We train our model with 128 batch size and Adam optimizer with weight-decay of 1.0e-6 to accelerate the training process for fast convergence. We also apply a warming up strategy with an initial learning rate 2.4e-5 and increase by 2.4e-5 every 2 epochs until it reaches 2.4e-4 at epoch 20.We then reduce the learning rate to 2.4e-5 for the remaining 10 epochs... It has 20 attention heads, 100 word nodes for each patent and 10 neighboring words for each word.