Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning Causal Alignment for Reliable Disease Diagnosis

Authors: Mingzhou Liu, Ching-Wen Lee, Xinwei Sun, Xueqing Yu, YU QIAO, Yizhou Wang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our method on two medical diagnosis applications, showcasing faithful alignment to radiologists. Code is publicly available at https://github.com/lmz123321/Causal_alignment. [...] In this section, we evaluate our method on two medical diagnosis tasks: the benign/malignant classification of lung nodules and breast masses. [...] We repeat 3 different seeds to remove the effect of randomness. [...] Table 1: Comparison with baseline methods on LIDC-IDRI and CBIS-DDSM datasets. [...] Table 2: Ablation study on LIDC-IDRI and CBIS-DDSM datasets. [...] Figure 4: CAM visualization. Each row denotes different cases.
Researcher Affiliation	Academia	Mingzhou Liu1 Ching-Wen Lee1 Xinwei Sun 2 Xueqing Yu1 Yu Qiao 3 Yizhou Wang4,1,5,6,7 1 School of Computer Science, Peking University 2 School of Data Science, Fudan University 3 School of Automation and Intelligent Sensing, Shanghai Jiao Tong University 4 Center on Frontiers of Computing Studies, Peking University 5 Institute for Artificial Intelligence, Peking University 6 Nat l Eng. Research Center of Visual Technology, Peking University 7 State Key Lab. of General Artificial Intelligence, Peking University Corresponding authors EMAIL EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Causal alignment training Input: Data D, Output: Decision model fθ, Hyperparameters: Sparsity regularization α, weight of alignment loss λ, learning rate η. 1: while not converged do 2: Forward pass 3: Compute Lce. 4: Optimize (2) to obtain x and compute Lalign using (3). 5: Compute L Lce + λLalign. 6: Back propagation 7: Estimate θLalign with conjugate gradient. 8: Update θ: θ θ η θL. // or Adam 9: end while
Open Source Code	Yes	Code is publicly available at https://github.com/lmz123321/Causal_alignment.
Open Datasets	Yes	We consider the LIDC-IDRI dataset Armato III et al. (2011) for lung nodule classification and the CBIS-DDSM dataset Lee et al. (2017) for breast mass classification.
Dataset Splits	Yes	We split the dataset into training (n = 731), validation (n = 238), and test (n = 244) sets. The CBIS-DDSM dataset ... We follow the official dataset split, with 691 masses in the training set and 200 masses in the test set.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	We use the Adam optimizer and set the learning rate as 0.001. We adopt the Torch Opt Ren et al. (2022) package to implement the conjugate gradient estimator.
Experiment Setup	Yes	We use the Adam optimizer and set the learning rate as 0.001. We parameterize the attributes prediction network fθ1 with a seven-layer Convolutional Neural Network (CNN), and train it for 100 epochs with a batch size of 128 for each iteration. For the classification network fθ2, we parameterize it with a two-layer Multi-Layer Perceptron (MLP), and train it for 30 epochs with a batch size of 128. Please refer to Appx. B for details of the network architectures. For the hyperparameters α1 in (7) and α2 in (6), we set them to α1 = 0.01, α2 = 0.0005 for LIDC-IDRI and α1 = 0.07, α2 = 0.0005 for CBIS-DDSM, respectively. For both datasets, we set λ1 = λ2 = 1 in (5).