Disentangled Sequential Autoencoder

Authors: Li Yingzhen, Stephan Mandt

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments on artificially generated cartoon video clips and voice recordings, we show that we can convert the content of a given sequence into another one by such content swapping. We carried out experiments both on video data (Section 4.1) as well as speech data (Section 4.2). In both setups, we find strong evidence that our model learns an approximately disentangled representation that allows for conditional generation and feature swapping.
Researcher Affiliation Collaboration 1University of Cambridge, UK 2Disney Research, Los Angeles, CA, USA.
Pseudocode No The paper describes the model architecture and training process but does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link for the open-source code of the described methodology.
Open Datasets Yes The TIMIT data (Garofolo et al., 1993) contains broadband 16k Hz recordings of phonetically-balanced read speech. We downloaded and selected the online available sprite sheets, and organised them into 4 attribute categories (skin color, tops, pants and hairstyle) and 9 action categories (walking, casting spells and slashing, each with three viewing angles).
Dataset Splits Yes We used 1000 of them for training/validation and the rest of them for testing.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions tools and algorithms (e.g., Pymunk, LSTM) but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes For the deterministic LSTMs, we fix the dimensionality of zt to 64, and set ht and the LSTM internal states to be 512 dimensions. The latent variable dimensionality of the stochastic dynamics is dim(zt) = 16.