reproducibilityindex.ai

It’s Raw! Audio Generation with State-Space Models

Authors: Karan Goel, Albert Gu, Chris Donahue, Christopher Re

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5. Experiments We evaluate SASHIMI on several benchmark audio generation and unconditional speech generation tasks in both AR and non-AR settings, validating that SASHIMI generates more globally coherent waveforms than baselines while having higher computational and sample efficiency.
Researcher Affiliation	Academia	Karan Goel 1 Albert Gu 1 Chris Donahue 1 Christopher R e 1 ... 1Department of Computer Science, Stanford University. Correspondence to: Karan Goel <kgoel@cs.stanford.edu>, Albert Gu <albertgu@stanford.edu>.
Pseudocode	No	The paper describes the architecture and processes in text and diagrams but does not include explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not include an unambiguous statement that the authors are releasing the source code for the SASHIMI methodology, nor does it provide a direct link to a code repository for SASHIMI.
Open Datasets	Yes	The datasets we used can be found on Huggingface datasets: Beethoven, You Tube Mix, SC09.
Dataset Splits	Yes	Table 1. Summary of music and speech datasets used for unconditional AR generation experiments. ... MUSIC YOUTUBEMIX ... 88% 6% 6%
Hardware Specification	Yes	All methods in the AR setting were trained on single V100 GPU machines. All diffusion models were trained on 8-GPU A100 machines.
Software Dependencies	No	The paper mentions adapting 'Py Torch implementation' for models but does not provide specific version numbers for PyTorch or any other software libraries or dependencies.
Experiment Setup	Yes	For all datasets, we use feature expansion of 2 when pooling, and use a feedforward dimension of 2 the model dimension in all inverted bottlenecks in the model. We use a model dimension of 64. For S4 parameters, we only train Λ and C with the recommended learning rate of 0.001, and freeze all other parameters for simplicity (including pp , B, dt). We train with 4 4 pooling for all datasets, with 8 S4 blocks per tier.