reproducibilityindex.ai

Kernel Multimodal Continuous Attention

Authors: Alexander Moreno, Zhenke Wu, Supriya Nagesh, Walter Dempsey, James M. Rehg

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that kernel continuous attention often outperforms unimodal continuous attention, and the sparse variant tends to highlight time series peaks.
Researcher Affiliation	Collaboration	Alexander Moreno Luminous Computing Zhenke Wu University of Michigan Supriya Nagesh Georgia Tech Walter Dempsey University of Michigan James M. Rehg Georgia Tech
Pseudocode	Yes	Algorithm 1 Continuous Attention Mechanism via Kernel Deformed Exponential Families
Open Source Code	Yes	Code is in our repository3, where we discuss the ﬂags used to control precision on recent GPUs and Pytorch versions. 3https://github.com/onenoc/kernel-continuous-attention
Open Datasets	Yes	We analyze u Wave [19]: accelerometer time series with eight gesture classes. We follow [16] s split into 3,582 training observations and 896 test observations: sequences have length 945. ... We extend [23] s code5 for IMDB sentiment classiﬁcation [20]. This uses a document representation v from a convolutional network and an LSTM attention model. ... 5[23] s repository for this dataset is https://github.com/deep-spin/quati
Dataset Splits	No	For u Wave, it states 'We follow [16] s split into 3,582 training observations and 896 test observations'. It specifies training and test splits, but no explicit validation split is mentioned with specific counts or percentages.
Hardware Specification	Yes	Our UWave and ECG experiments were done on a Titan X GPU, IMDB on a 1080, and Ford A on an A40.
Software Dependencies	Yes	As an example, Figure 8 was done on a Titan X with an older version of Pytorch, while Figure 7 was done with an A40 with Pytorch 1.12.
Experiment Setup	Yes	For u Wave: 'All methods use 100 attention heads. Gaussian mixture uses 100 components (and thus 300 parameters per head), and kernel methods use 256 inducing points.' For IMDB: 'N: basis functions, I = 10 inducing points, bandwidth 0.01.'