Continuous-Time Attention for Sequential Learning

Authors: Jen-Tzung Chien, Yi-Hsiang Chen7116-7124

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experiments on irregular sequence samples from human activities, dialogue sentences and medical features show the merits of the proposed continuous-time attention for activity recognition, sentiment classification and mortality prediction, respectively.
Researcher Affiliation Academia Jen-Tzung Chien, Yi-Hsiang Chen Department of Electrical and Computer Engineering National Chaio Tung University, Hsinchu, Taiwan {jtchien, ethernet420.eed08g}@nctu.edu.tw
Pseudocode Yes Algorithm 1: Attentive neural differential equation
Open Source Code No The paper does not provide any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes Human activity dataset (Kaluza et al. 2010) was used as an action recognition task; Multimodal Emotion Lines Dataset (MELD) (Poria et al. 2019) contained the dialogue instances collected from Friends TV series; Physio Net (Silva et al. 2012) was collected from the intensive care unit (ICU).
Dataset Splits No The paper does not explicitly state specific training, validation, and test dataset splits or percentages, nor does it refer to predefined splits with citations for reproducibility.
Hardware Specification No The paper does not explicitly describe the specific hardware used to run its experiments, such as GPU models, CPU types, or cloud instance specifications.
Software Dependencies No The paper mentions software components like 'Adamax' and 'Glove embedding' but does not provide specific version numbers for these or other key software dependencies.
Experiment Setup Yes Number of training epoch was 200. Learning rate was initialized by 0.01 and decayed after each iteration by multiplying 0.999. Adamax (Kingma and Ba 2014) was used. Hidden state size was 15. Relative and absolute tolerances were 1e-3 and 1e-4 for solver, respectively. A six-layer fully-connected network was configured as ODE function. One-layer GRU was used as RNN cell. Classifier was built by three-layer fully-connected network.