Causal Temporal Representation Learning with Nonstationary Sparse Transition
Authors: Xiangchen Song, Zijian Li, Guangyi Chen, Yujia Zheng, Yewen Fan, Xinshuai Dong, Kun Zhang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental evaluations on synthetic and real-world datasets demonstrate significant improvements over existing baselines, highlighting the effectiveness of our approach. |
| Researcher Affiliation | Academia | 1Carnegie Mellon University 2Mohamed bin Zayed University of Artificial Intelligence |
| Pseudocode | No | Our framework builds on VAE [34, 35] architecture, incorporating dedicate modules to handle nonstationarity. It enforces the conditions discussed in Sec. 3 as constraints. As shown in Fig. 2, the framework consists of three primary components: (1) Sparse Transition, (2) Prior Network, and (3) Encoder-Decoder. |
| Open Source Code | Yes | Our code is also available via https://github.com/xiangchensong/ctrlns. |
| Open Datasets | Yes | Our evaluation used two datasets: Hollywood Extended [38], which includes 937 videos with 16 daily action categories, and Cross Task [39], focusing on 14 of 18 primary tasks related to cooking [40], comprising 2552 videos across 80 action categories. |
| Dataset Splits | Yes | In the Hollywood dataset, we used the default 10-fold dataset split setting and calculated the mean and standard derivation from those 10 runs. |
| Hardware Specification | Yes | All experiments are performed on a GPU server with 128 CPU cores, 1TB memory, and one NVIDIA L40 GPU. |
| Software Dependencies | Yes | For synthetic experiments, the models were implemented in Py Torch 2.2.2. |
| Experiment Setup | Yes | We trained the VAE network using the Adam W optimizer with a learning rate of 5 10 4 and a mini-batch size of 64. |