AutoAttend: Automated Attention Representation Search
Authors: Chaoyu Guan, Xin Wang, Wenwu Zhu
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show the superiority of our proposed Auto Attend model over previous state-of-the-arts on eight text classification tasks in NLP and four node classification tasks in GRL. |
| Researcher Affiliation | Academia | Chaoyu Guan 1 Xin Wang 1 Wenwu Zhu 1 Department of Computer Science and Technology, Tsinghua University . Correspondence to: Xin Wang <xin wang@tsinghua.edu.cn>, Wenwu Zhu <wwzhu@tsinghua.edu.cn>. |
| Pseudocode | No | The paper describes algorithms verbally (e.g., 'Mont-Carlo to estimate the expectation and use Gradient Descent to find the optimal solution', 'evolutionary search') but does not present them in structured pseudocode or an algorithm block. |
| Open Source Code | Yes | Code will be published at https://github.com/THUMNLab/AutoAttend |
| Open Datasets | Yes | The tasks and datasets used in this paper are introduced in Section 5.1. The detailed information of datasets we use is shown in Table 1. (Table 1 lists SST, AG, DBP, YELP-B, YELP, YAHOO, AMZ-B with number of classes, train, valid, test splits). The detailed information of datasets we use is shown in Table 2. (Table 2 lists CORA, CITESEER, PUBMED, PPI with #CLASS, #FEATURE, #NODE, #EDGE). The word embeddings are initialized from pretrained Glo Ve (Pennington et al., 2014) and are fine-tuned during training. |
| Dataset Splits | Yes | Table 1. Detailed information of natural language processing datasets used in this paper. DATASET #CLASS #TRAIN #VALID #TEST SST 5 8,544 1,101 2,210 SST-B 2 6,920 872 1,821 AG 4 120,000 7,600 DBP 14 560,000 70,000 YELP-B 2 560,000 38,000 YELP 5 650,000 50,000 YAHOO 10 1,400,000 60,000 AMZ-B 2 3,600,000 400,000 |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions software components like GloVe and Adam (an optimizer often used with PyTorch), but it does not specify version numbers for any software dependencies (e.g., 'PyTorch 1.9', 'Python 3.8'). |
| Experiment Setup | Yes | For searching in NLP, we set the layer number to 24 to stay consistent with previous works. The word embeddings are initialized from pretrained Glo Ve (Pennington et al., 2014) and are fine-tuned during training. When searching, we use hidden size 64, batch size 128, learning rate 0.005 with Adam (Kingma & Ba, 2015), dropout 0.1, and max input sentence length 64. |