Multi-Label Patent Categorization with Non-Local Attention-Based Graph Convolutional Network
Authors: Pingjie Tang, Meng Jiang, Bryan (Ning) Xia, Jed W. Pitera, Jeffrey Welser, Nitesh V. Chawla9024-9031
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the performance of our model and as many as seven competitive baselines. We find that our model outperforms all those prior state of the art by a large margin and achieves high performance on P@k and n DCG@k. |
| Researcher Affiliation | Collaboration | 1University of Notre Dame, Notre Dame, Indiana 46556, USA {ptang, mjiang2, nxia, nchawla}@nd.edu, 2IBM Research Almaden, 650 Harry Road, San Jose, California 95120, USA {pitera, welser}@us.ibm.com |
| Pseudocode | No | The paper contains figures illustrating the model architecture and data flow, but no formal pseudocode blocks or algorithms. |
| Open Source Code | No | The paper does not provide a link to open-source code or explicitly state that the code for the described methodology is available. |
| Open Datasets | No | We query 45,526 patents with 555 IPC codes from an internal patent database named CIRCA. We name the dataset as Patent CIRCA 45k. |
| Dataset Splits | Yes | We split the data into training and testing in a 80/20 ratio, in actual experiments, we hold 10% training data as the validation set (unlisted in Table 1) that is used to choose the optimal parameters. |
| Hardware Specification | Yes | We use two Nvidia Titan Xp GPUs |
| Software Dependencies | No | The paper mentions using 'Py Torch framework', 'Gensim Word2vec library', and 'NLTK' but does not specify version numbers for these software components. |
| Experiment Setup | Yes | We use 256 hidden units in Bi LSTM for GCN aggregators. We choose maximal search depth k as 2... We use the same two FC layers (512 hidden units and 256 hidden units, respectively)... We use Re LU and Dropout (0.5 rate) between these FC layers. We train our model with 128 batch size and Adam optimizer with weight-decay of 1.0e-6 to accelerate the training process for fast convergence. We also apply a warming up strategy with an initial learning rate 2.4e-5 and increase by 2.4e-5 every 2 epochs until it reaches 2.4e-4 at epoch 20.We then reduce the learning rate to 2.4e-5 for the remaining 10 epochs... It has 20 attention heads, 100 word nodes for each patent and 10 neighboring words for each word. |