Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Partial Information Decomposition via Normalizing Flows in Latent Gaussian Distributions

Authors: Wenyuan Zhao, Adithya Balachandran, Chao Tian, Paul Liang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical validation in diverse synthetic examples demonstrates that our proposed method provides more accurate and efficient PID estimates than existing baselines. We further evaluate a series of large-scale multimodal benchmarks to show its utility in real-world applications of quantifying PID in multimodal datasets and selecting high-performing models.
Researcher Affiliation	Academia	Wenyuan Zhao1 Adithya Balachandran2 Chao Tian1 Paul Pu Liang2 1Texas A&M University 2Massachusetts Institute of Technology 1EMAIL, 2EMAIL
Pseudocode	Yes	Algorithm 1 Thin-PID algorithm. Algorithm 2 Flow-PID algorithm
Open Source Code	Yes	Finally, we release the data and code for Thin-PID and Flow-PID to encourage further studies of multimodal information and modeling at https://github.com/warrenzha/flow-pid.
Open Datasets	Yes	Empirical validation in diverse synthetic examples demonstrates that our proposed method provides more accurate and efficient PID estimates than existing baselines. We further evaluate a series of large-scale multimodal benchmarks to show its utility in real-world applications of quantifying PID in multimodal datasets and selecting high-performing models. We use a collection of real-world multimodal datasets in Multi Bench [30], which spans 10 diverse modalities (images, video, audio, text, time-series), 15 prediction tasks, and 5 research areas.
Dataset Splits	No	The paper mentions using Multi Bench datasets, which are benchmarks that typically have standard splits. However, the paper does not explicitly state the specific training/validation/test splits (e.g., percentages, sample counts, or references to predefined splits) used for its experiments on these or synthetic datasets within the provided text.
Hardware Specification	Yes	All experiments with synthetic datasets are performed on a Linux machine, equipped with 48GB RAM and NVIDIA Ge Force RTX 4080.
Software Dependencies	No	The paper does not explicitly list any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions) used in the experiments.
Experiment Setup	Yes	Table C.3: Training recipe. Table D.3: NN architectures for multi-modal fusion models. Table D.4: Table of hyperparameters for affective computing datasets. Table D.5: Table of hyperparameters for AV-MNIST encoders. Table D.6: Table of hyperparameters for ENRICO dataset in the HCI domain.