DanceAnyWay: Synthesizing Beat-Guided 3D Dances with Randomized Temporal Contrastive Learning

Authors: Aneesh Bhattacharya, Manas Paranjape, Uttaran Bhattacharya, Aniket Bera

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the performance of our approach through extensive experiments on the benchmark AIST++ dataset and observe improvements of about 7% 12% in motion quality metrics and 1.5% 4% in motion diversity metrics over the current baselines, respectively.
Researcher Affiliation Collaboration Aneesh Bhattacharya1,2, Manas Paranjape1, Uttaran Bhattacharya3, Aniket Bera1 1Purdue University, USA 2IIIT Naya Raipur, India 3Adobe Research, USA {bhatta95, mparanja, aniketbera}@purdue.edu, ubhattac@adobe.com
Pseudocode No The paper includes architectural diagrams (e.g., 'Figure 2: Dance Any Way Network Architecture') but does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our source code and project page are available at https://github.com/aneeshbhattacharya/Dance Any Way.
Open Datasets Yes We use the benchmark AIST++ dataset (Li et al. 2021b), a large-scale 3D dance dataset of paired music and pose sequences spanning ten dance genres.
Dataset Splits No The paper states 'We use the official dataset splits for training and testing our model' but does not explicitly detail or mention a specific validation split or how it was used in the experimental setup.
Hardware Specification Yes Training our BPS, RPS, and trajectory predictor takes 6, 16, and 3 hours, respectively, on an NVIDIA A100 GPU.
Software Dependencies No The paper mentions using 'Librosa (Brian Mc Fee et al. 2015)' for feature extraction but does not provide specific version numbers for Librosa or any other key software dependencies like deep learning frameworks (e.g., PyTorch, TensorFlow).
Experiment Setup Yes We train our network using 7-second dance clips sampled at 10 fps, i.e., with T = 70, and use a seed pose length TS = 20. For our RTC loss, we use segments of length m = 25 with sliding window length d = 5. We use a maximum of |B| = 20 beat frames and |BS| = 3 seed beat frames. We use DM = 32, DCB = 4, DCR = 6, DP = 16, ΘB, ΘR, and ΘT . ΘB and ΘT have 4 heads and 6 blocks, while ΘR has 8 heads and 6 blocks, ΦZ R with 3 heads and 8 blocks, ΦU R with 3 heads and 4 blocks, and ΦT with 1 head and 8 blocks. For BPS, we use the Adam optimizer (Kingma and Ba 2014) with β1 = 0.5, β2 = 0.99, a mini-batch size of 8, a learning rate (LR) of 1e 4, and train for 500 epochs. For RPS, we use the Adam optimizer with β1 = 0.5, β2 = 0.99, a mini-batch size of 8, an LR of 1e 4 for both the generator and discriminator, and train for 250 epochs. For our trajectory predictor, we use the Adam optimizer with β1 = 0.8, β2 = 0.99, a mini-batch size of 8, a learning rate of 1e 5, and train for a total of 700 epochs.