reproducibilityindex.ai

How Does it Sound?

Authors: Kun Su, Xiulong Liu, Eli Shlizerman

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Rhythmic Net on large scale video datasets that include body movements with inherit sound association, such as dance, as well as in the wild internet videos of various movements and actions. We show that the method can generate plausible music that aligns with different types of human movements. 4 Experiments & Results
Researcher Affiliation	Academia	Department of Electrical & Computer Engineering, University of Washington, Seattle, USA. Department of Applied Mathematics, University of Washington, Seattle, USA Corresponding author: shlizee@uw.edu
Pseudocode	No	The paper describes the computational steps and models in text but does not include any formally structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code. System setup and code are available in a Github repository5. https://github.com/shlizee/Rhythmic Net
Open Datasets	Yes	We use the AIST Dance Video Database, a large-scale collection of dance videos in 60fps for training and testing of Video2Rhythm [69]. For Rhythm2Drum, we use the Groove Midi dataset [50] which contains 1150 Midi ﬁles and over 22, 000 measures of drumming. For Drum2Music, we extract two subsets of Lakh Midi dataset [70] to separately train Drum2Piano and Drum2Guitar models.
Dataset Splits	Yes	We split the samples into train/validate/test sets by 0.8/0.1/0.1 based on the dance genres, dancers, and camera ids. We split the data into 0.8/0.1/0.1 of train/validate/test sets. This results in 34991/1944/1944 segments for train/validate/test sets respectively. For drum2guitar, we perform a similar selection to obtain 12904/717/717 segments for train/validate/test sets respectively.
Hardware Specification	Yes	We use Pytorch [71] to implement all models in Rhymic Net with two Titan X GPUs.
Software Dependencies	No	The paper states 'We use Pytorch [71] to implement all models' but does not provide specific version numbers for Pytorch or other software dependencies mentioned like Open Pose framework, U-net, or Transformer-XL.
Experiment Setup	Yes	In Video2Rhythm, the network contains a 10-layer ST-GCN and a 2-layer transformer encoder with 2-head attention. ... In Drum2Music, the model consists of a recurrent transformer encoder and a recurrent transformer decoder. We set the number of encoder layers, decoder layers, encoder heads and decoder heads to 4, 8, 8, and 8 respectively. The length of the training input tokens and the length of the memory is 256. We provide additional conﬁguration details in the supplementary materials.