reproducibilityindex.ai

Diagonal State Spaces are as Effective as Structured State Spaces

Authors: Ankit Gupta, Albert Gu, Jonathan Berant

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the performance of DSS on Long Range Arena (LRA) which is a suite of sequence-level classiﬁcation tasks with diverse input lengths (1K-16K) requiring similarity, structural, and visual-spatial reasoning over a wide range of modalities such as text, natural/synthetic images, and mathematical expressions. Despite its simplicity, DSS delivers an average accuracy of 81.88 across the 6 tasks of LRA, comparable to the state-of-the-art performance of S4 (80.21).
Researcher Affiliation	Collaboration	Ankit Gupta IBM Research ankitgupta.iitkanpur@gmail.com Albert Gu Stanford University albertgu@stanford.edu Jonathan Berant Tel Aviv University joberant@cs.tau.ac.il
Pseudocode	Yes	Algorithm 1: DSSSOFTMAX KERNEL (SKETCH)
Open Source Code	Yes	Our code is available at https://github.com/ag1988/dss.
Open Datasets	Yes	We evaluate the performance of DSS on Long Range Arena (LRA) which is a suite of sequence-level classiﬁcation tasks with diverse input lengths (1K-16K)...
Dataset Splits	Yes	Long Range Arena (LRA) [TDA 21] is a standard benchmark for assessing the ability of models to process long sequences.
Hardware Specification	No	Our experiments were conducted on IBM s Cognitive Computing Cluster, with additional resources from Tel Aviv University. The paper also mentions in its checklist (3d) that resources are specified in A.3, but A.3 is not provided in the main text.
Software Dependencies	No	The paper mentions 'Py Torch implementation' but does not provide specific version numbers for PyTorch or any other software dependencies in the main text. Details might be in A.3, which is not provided.
Experiment Setup	Yes	The real and imaginary parts of each element of W are initialized from Np0, 1q. Each element of log is initialized as er where r Uplogp.001q, logp.1qq. P CN is initialized using eigenvalues of the normal part of normal plus low-rank form of Hi PPO matrix [GGR22]. Concretely, re, im are initialized such that the resulting is the vector of those N eigenvalues of the following 2N ˆ 2N matrix which have a positive imaginary part. In all our experiments, we used the above initialization with N 64. The initial learning rate of all DSS parameters was 10 3 and weight decay was not applied to them.