Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Estimating Mutual Information for Discrete-Continuous Mixtures
Authors: Weihao Gao, Sreeram Kannan, Sewoong Oh, Pramod Viswanath
NeurIPS 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove the consistency of this estimator theoretically as well as demonstrate its excellent empirical performance. This problem is relevant in a wide-array of applications, where some variables are discrete, some continuous, and others are a mixture between continuous and discrete components. ... Section 5 contains the results of our detailed synthetic and real-world experiments testing the efficacy of the proposed estimator. |
| Researcher Affiliation | Academia | Weihao Gao Department of ECE Coordinated Science Laboratory University of Illinois at Urbana-Champaign EMAIL Sreeram Kannan Department of Electrical Engineering University of Washington EMAIL Sewoong Oh Department of IESE Coordinated Science Laboratory University of Illinois at Urbana-Champaign EMAIL Pramod Viswanath Department of ECE Coordinated Science Laboratory University of Illinois at Urbana-Champaign EMAIL |
| Pseudocode | Yes | Algorithm 1 Mixed Random Variable Mutual Information Estimator |
| Open Source Code | No | The paper does not provide any statements about open-sourcing the code for the described methodology, nor does it include any links to a code repository. |
| Open Datasets | Yes | Gene regulatory network inference. ... Instead we resorted to a challenge dataset for reconstructing regulatory networks, called the DREAM5 challenge [30]. The simulated (insilico) version of this dataset contains gene expression for 20 genes with 660 data point containing various perturbations. |
| Dataset Splits | No | The paper describes synthetic data generation and real-world datasets but does not specify training, validation, or test splits for any of its experiments. It focuses on mean squared error versus sample size. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, or other libraries). |
| Experiment Setup | No | The paper describes the characteristics of the synthetic data generated for experiments (e.g., 'X is uniformly distributed over integers {0, 1, . . . , m 1} and Y is uniformly distributed over the range [X, X + 2] for a given X'). However, it does not provide specific hyperparameter values or system-level training settings used for its estimator in these experiments. |