reproducibilityindex.ai

Decision-Making Under Selective Labels: Optimal Finite-Domain Policies and Beyond

Authors: Dennis Wei

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The proposed policies are evaluated in experiments (reported in Section 6) on synthetic data and two real-world datasets featuring high-stakes decisions.
Researcher Affiliation	Industry	1IBM Research, Yorktown Heights, NY, USA. Correspondence to: <dwei@us.ibm.com>.
Pseudocode	No	The paper describes algorithms in text but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described in this paper. It mentions using the Vowpal Wabbit (VW) library, which is a third-party tool, and links to a dataset.
Open Datasets	Yes	We now turn to two real-world datasets, the FICO Challenge dataset (FICO, 2018) and the COMPAS recidivism dataset (Angwin et al., 2016), for evaluating the policy of Section 5. ... The COMPAS dataset, also used by Kilbertus et al. (2020), contains demographics and criminal histories of offenders, a recidivism risk score produced by the COMPAS tool, and an outcome variable indicating whether the offender was re-arrested within two years. Acceptance corresponds to releasing an offender on bail. ... dataset available at https://github.com/ propublica/compas-analysis/blob/ master/compas-scores-two-years.csv.
Dataset Splits	No	The paper describes an initial training (exploration) set of size B0 individuals who are always accepted, but it does not specify explicit train/validation/test dataset splits in percentages or absolute counts for the entire dataset used in experiments.
Hardware Specification	Yes	The continued use of the homogeneous policy is motivated by two reasons: first, its optimality for finite domains, which might be used to approximate an infinite or continuous domain, and second, the ease of computing the approximation V N(µ, ν) (taking milliseconds on a Mac Book Pro for N = 1000 in Figure 1).
Software Dependencies	No	The paper mentions using the 'Vowpal Wabbit (VW) library' but does not specify a version number for it or any other software dependencies.
Experiment Setup	Yes	For evaluation, rewards are summed using the discount factor γ = 0.999. The number of rounds T is set to 5/(1 γ) so that the sum of truncated discount weights, P t=T γt, is less than 1% of the total sum P t=0 γt. ... In addition, to provide an initial training (i.e. exploration) set, the ﬁrst B0 individuals are always accepted and their outcomes are observed. ... The CL algorithm ... is re-implemented for the case of no fairness penalty (λ = 0) and policy updates after every acceptance/observation (N = 1). ... Parameter settings and tuning are discussed in Appendix D.3.