Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning to search efficiently for causally near-optimal treatments
Authors: Samuel Håkansson, Viktor Lindblom, Omer Gottesman, Fredrik D. Johansson
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The methods are evaluated on synthetic and real-world healthcare data and compared to model-free reinforcement learning. We find that our methods compare favorably to the model-free baseline while offering a more transparent trade-off between search time and treatment efficacy. |
| Researcher Affiliation | Academia | Samuel H akansson University of Gothenburg EMAIL Viktor Lindblom Chalmers University of Technology EMAIL Omer Gottesman Brown University EMAIL Fredrik D. Johansson Chalmers University of Technology EMAIL |
| Pseudocode | No | The paper describes dynamic programming and greedy approximation algorithms but does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | Implementations can be found at: https://github.com/Healthy-AI/Treatment Exploration |
| Open Datasets | Yes | We evaluate our proposed methods using synthetic and real-world healthcare data... MIMIC-III database (Johnson et al., 2016). We consider training sets in a low-data regime with 50 samples and a high-data regime of 75000, with fixed test set size of 3000 samples. The cohort was split randomly into a training and test set with a 70/30 ratio and experiments were repeated over five such splits. |
| Dataset Splits | Yes | We consider training sets in a low-data regime with 50 samples and a high-data regime of 75000, with fixed test set size of 3000 samples. The cohort was split randomly into a training and test set with a 70/30 ratio and experiments were repeated over five such splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions methods like "function approximation using random forests" and "historical kernel-smoothing" but does not specify software dependencies with version numbers (e.g., Python version, library names and their versions). |
| Experiment Setup | Yes | Here, CDP and CG use δ = 0.4, = 0 and the upper bound of (10) and NDP λ = 0.35. We sweep all hyperparameters uniformly over 10 values; for CDP, CG, δ 2 [0, 1], for NDP H, λ 2 [0, 0.5] and for NDP F, λ 2 [0, 1]. |