Causal Temporal Representation Learning with Nonstationary Sparse Transition

Authors: Xiangchen Song, Zijian Li, Guangyi Chen, Yujia Zheng, Yewen Fan, Xinshuai Dong, Kun Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental evaluations on synthetic and real-world datasets demonstrate significant improvements over existing baselines, highlighting the effectiveness of our approach.
Researcher Affiliation Academia 1Carnegie Mellon University 2Mohamed bin Zayed University of Artificial Intelligence
Pseudocode No Our framework builds on VAE [34, 35] architecture, incorporating dedicate modules to handle nonstationarity. It enforces the conditions discussed in Sec. 3 as constraints. As shown in Fig. 2, the framework consists of three primary components: (1) Sparse Transition, (2) Prior Network, and (3) Encoder-Decoder.
Open Source Code Yes Our code is also available via https://github.com/xiangchensong/ctrlns.
Open Datasets Yes Our evaluation used two datasets: Hollywood Extended [38], which includes 937 videos with 16 daily action categories, and Cross Task [39], focusing on 14 of 18 primary tasks related to cooking [40], comprising 2552 videos across 80 action categories.
Dataset Splits Yes In the Hollywood dataset, we used the default 10-fold dataset split setting and calculated the mean and standard derivation from those 10 runs.
Hardware Specification Yes All experiments are performed on a GPU server with 128 CPU cores, 1TB memory, and one NVIDIA L40 GPU.
Software Dependencies Yes For synthetic experiments, the models were implemented in Py Torch 2.2.2.
Experiment Setup Yes We trained the VAE network using the Adam W optimizer with a learning rate of 5 10 4 and a mini-batch size of 64.