Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Partial Information Decomposition via Normalizing Flows in Latent Gaussian Distributions
Authors: Wenyuan Zhao, Adithya Balachandran, Chao Tian, Paul Liang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical validation in diverse synthetic examples demonstrates that our proposed method provides more accurate and efficient PID estimates than existing baselines. We further evaluate a series of large-scale multimodal benchmarks to show its utility in real-world applications of quantifying PID in multimodal datasets and selecting high-performing models. |
| Researcher Affiliation | Academia | Wenyuan Zhao1 Adithya Balachandran2 Chao Tian1 Paul Pu Liang2 1Texas A&M University 2Massachusetts Institute of Technology 1EMAIL, 2EMAIL |
| Pseudocode | Yes | Algorithm 1 Thin-PID algorithm. Algorithm 2 Flow-PID algorithm |
| Open Source Code | Yes | Finally, we release the data and code for Thin-PID and Flow-PID to encourage further studies of multimodal information and modeling at https://github.com/warrenzha/flow-pid. |
| Open Datasets | Yes | Empirical validation in diverse synthetic examples demonstrates that our proposed method provides more accurate and efficient PID estimates than existing baselines. We further evaluate a series of large-scale multimodal benchmarks to show its utility in real-world applications of quantifying PID in multimodal datasets and selecting high-performing models. We use a collection of real-world multimodal datasets in Multi Bench [30], which spans 10 diverse modalities (images, video, audio, text, time-series), 15 prediction tasks, and 5 research areas. |
| Dataset Splits | No | The paper mentions using Multi Bench datasets, which are benchmarks that typically have standard splits. However, the paper does not explicitly state the specific training/validation/test splits (e.g., percentages, sample counts, or references to predefined splits) used for its experiments on these or synthetic datasets within the provided text. |
| Hardware Specification | Yes | All experiments with synthetic datasets are performed on a Linux machine, equipped with 48GB RAM and NVIDIA Ge Force RTX 4080. |
| Software Dependencies | No | The paper does not explicitly list any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions) used in the experiments. |
| Experiment Setup | Yes | Table C.3: Training recipe. Table D.3: NN architectures for multi-modal fusion models. Table D.4: Table of hyperparameters for affective computing datasets. Table D.5: Table of hyperparameters for AV-MNIST encoders. Table D.6: Table of hyperparameters for ENRICO dataset in the HCI domain. |