Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Identifiability of Hierarchical Temporal Causal Representation Learning

Authors: Zijian Li, Minghao Fu, Junxian Huang, Yifan Shen, Ruichu Cai, Yuewen Sun, Guangyi Chen, Kun Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluations on both synthetic and realworld datasets validate our theoretical claims and demonstrate the effectiveness of CHi LD in modeling hierarchical latent dynamics. 3
Researcher Affiliation	Academia	1Carnegie Mellon University 2Mohamed bin Zayed University of Artificial Intelligence 3Guangdong University of Technology 4 University of California San Diego
Pseudocode	No	The paper describes implementation details in Appendix F and G, but does not present any structured pseudocode or algorithm blocks labeled as such.
Open Source Code	Yes	3https://github.com/Minghao Fu/CHi LD
Open Datasets	Yes	We consider the task of unconditional time series generation and use the following datasets: Stock, ETTh1, f MRI, Mu Jo Co, and two human motion datasets, Human3.6M and Human Eva-I. For Human3.6M [32], we choose 3 motions: Discussion, Purchases, and Walking Dog. For Human Eva-I [65], we choose Box, Gesture, and Throwcatch. We further consider three climate datasets: Weather, Weather Bench, and CESM2. Please refer to Appendix G.2.1 and D for the dataset description and the connection between time series generation and modeling hierarchical temporal latent dynamics. Weather 7 dataset offers 10-minute summaries from an automated rooftop station at the Max Planck Institute for Biogeochemistry in Jena, Germany. Weather Bench 8 is a benchmark dataset for data-driven medium-range weather forecasting. It repackages forty years (1979-2018) of ERA5 global reanalysis into machine-learning-ready Net CDF tensors sampled every six hours. CESM2 9 delivers 100 fully coupled Earth system simulations at 1 resolution spanning 1850-2100 under CMIP6 historical and SSP3-7.0 forcing.
Dataset Splits	Yes	The total size of the dataset is 100,000, with 1,024 samples designated as the validation set. The remaining samples are the training set.
Hardware Specification	No	The paper does not explicitly mention specific hardware specifications such as GPU models, CPU types, or memory amounts used for the experiments. Although the NeurIPS checklist states that the information is provided in implementation details, the corresponding sections (Appendix F and G) do not contain these details.
Software Dependencies	No	The paper mentions utilizing publicly released code for TDRL, Ca Ri NG, and implementing IDOL based on TDRL. It does not provide specific version numbers for any software libraries or packages.
Experiment Setup	Yes	Table A5: Architecture details. T, length of time series. \|xt\|: input dimension. n: latent dimension. Leaky Re LU: Leaky Rectified Linear Unit. Tanh: Hyperbolic tangent function. Configuration Description Output ϕ Latent Variable Encoder Input:x1:t Observed time series Batch Size t x dimension Convolution neural networks \|xt\| neurons Batch Size t \|xt\| Concat zero concatenation Batch Size T \|xt\| Dense n neurons Batch Size T n Input:z1:T Latent Variable Batch Size T n Dense \|xt\| neurons, Tanh Batch Size T \|xt\| r Modular Prior Networks Input: z1:T Latent Variable Batch Size (n+1) Dense 128 neurons,Leaky Re LU (n+1) 128 Dense 128 neurons,Leaky Re LU 128 128 Dense 128 neurons,Leaky Re LU 128 128 Dense 1 neuron Batch Size 1 Jacobian Compute Compute log(det(J)) Batch Size. Finally, we train the proposed model by optimizing the Evidence Lower Bound (ELBO) as follows. We repeat each experiment over 3 random seeds and publish the mean and standard deviation.