Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Conditional Common Entropy for Instrumental Variable Testing and Partial Identification
Authors: Ziwei Jiang, Murat Kocaoglu
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the utility of the proposed method with simulated and real-world datasets. In this section, we first demonstrate the proposed method with simulated data and then provide some case studies with real-world data with instrumental variables. |
| Researcher Affiliation | Academia | Ziwei Jiang 1 Murat Kocaoglu 1 1Elmore Family School of Electrical and Computer Engineering, Purdue University. Correspondence to: Ziwei Jiang <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 IV Latent Search Input: Joint distribution P(X, Y, Z); Number of iterations N; initialization q(W|X, Y, Z); β0, β1 ≥ 0. for i = 1 to N do Form the joint: qi(X, Y, Z, W) = qi(W|X, Y, Z)P(X, Y, Z). Get posteriors: qi(W) = Σx,y,z qi(X, Y, Z, W) qi(W|X, Y ) = Pz qi(X,Y,Z,W ) / Pz,w qi(X,Y,Z,W ) qi(W|X, Z) = Py qi(X,Y,Z,W ) / Py,w qi(X,Y,Z,W ) Update: qi+1(X, Y, Z, W) = qi(W |X,Z)qi(W |X,Y )qi(U)β0+β1 / f(X,Y,Z)qi(W |X)q(W |Z)β1 where f(X, Y, Z) = Σu qi(W |X,Z)qi(W |X,Y )qi(U)β0+β1 / qi(W |X)q(W |Z)β1 end for Return: q N(W|X, Y, Z)P(X, Y, Z) |
| Open Source Code | Yes | Our code is available at https://github.com/ ziwei-jiang/Conditional-Common-Entropy |
| Open Datasets | Yes | We first demonstrate the proposed method with simulated data and then provide some case studies with real-world data with instrumental variables. In this section, we demonstrate our result in a more realistic setting with a synthetic dataset introduced by Lauritzen & Spiegelhalter (1988). We provide another example with the Pima Indians Diabetes dataset (Smith & Dickson, 1988). |
| Dataset Splits | No | The paper describes the use of synthetic and real-world datasets (Lung Cancer Dataset, Pima Indians Diabetes dataset) but does not provide specific details on how these datasets were split into training, validation, or testing sets. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used for conducting the experiments, such as GPU models, CPU types, or cloud computing resources. |
| Software Dependencies | No | The paper does not provide specific software dependency versions (e.g., programming language versions, library versions, or specific solver versions) used in the experiments. |
| Experiment Setup | Yes | The algorithm converges around 200 iterations. To approximate the CCE, we iteratively search with 100 values of β0 [0, 1] and β1 [0, 0.5]. The result is shown in Figure 8. Then we take the CCE as the minimum entropy H(W) such that both I(Y ; Z|X, W) and I(Z; W) are smaller than the threshold 1e 5. |