Learning Interpretable Low-dimensional Representation via Physical Symmetry

Authors: Xuanjie Liu, Daniel Chin, Yichen Huang, Gus Xia

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test SPS under two modalities of temporal signals: music (section 4.1) and video (section 4.2). Each model is executed with 10 random initialisations, and evaluated on the test set. The highlight of this section is that SPS effectively learns interpretable low-dimensional factors that align with human perceptions. Also, by utilizing small training sets, we show the high sampling efficiency of our model.
Researcher Affiliation Academia 1Mohamed bin Zayed University of Artificial Intelligence 2New York University Shanghai {Xuanjie.Liu, Nanfeng.Qin, Yichen.Huang, Gus.Xia}@mbzuai.ac.ae
Pseudocode No The paper describes its methods in detail with mathematical formulations and diagrams, but it does not include any formal pseudocode or algorithm blocks.
Open Source Code Yes The source code is publicly available at https://github.com/Xuanjie Liu/Self-supervised-learning-via Physical-Symmetry. The demo page is available at https://xuanjieliu.github.io/SPS_demo/
Open Datasets Yes We utilize the melodies from the Nottingham Dataset [Foxley, 2011], a collection of 1200 American and British folk songs. [...] In this task, we evaluate our method s capability on a real-world dataset, KITTI-Masks [Klindt et al., 2021].
Dataset Splits No The paper specifies training and test/evaluation sets (e.g., 'We utilize 512 trajectories for training, and an additional 512 trajectories for evaluation.'), but it does not explicitly mention a separate validation set or provide details on its split.
Hardware Specification No The paper describes the model architecture and training process but does not provide specific details on the hardware (e.g., GPU models, CPU types, or cloud resources) used for running the experiments.
Software Dependencies No The paper mentions using the 'Adam optimiser' but does not specify any software libraries, frameworks, or their version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes For both tasks, we use the Adam optimiser with learning rate = 10 3. The training batch size is 32 across all of our experiments. For all VAE-based models, including SPSVAE (ours/ablation) and β-VAE (baseline), we set β (i.e., λ3 in Equation (1)) to 0.01, with λ1 = 1 and λ2 = 2. All BCE and MSE loss functions are calculated in sum instead of mean. K = 4 for all SPS models except for those discussed in section 5 where we analyse the influence of different K. The RNN predicts zn+1:T given the first n embeddings z1:n. We choose n = 3 for the audio task and n = 5 for the vision task. We adopt scheduled sampling [Bengio et al., 2015] during the training stage, where we gradually reduce the guidance from teacher forcing. After around 50000 batch iterations, the RNN relies solely on the given z1:T and predicts auto-regressively.