Decision-Making Under Selective Labels: Optimal Finite-Domain Policies and Beyond
Authors: Dennis Wei
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed policies are evaluated in experiments (reported in Section 6) on synthetic data and two real-world datasets featuring high-stakes decisions. |
| Researcher Affiliation | Industry | 1IBM Research, Yorktown Heights, NY, USA. Correspondence to: <dwei@us.ibm.com>. |
| Pseudocode | No | The paper describes algorithms in text but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. It mentions using the Vowpal Wabbit (VW) library, which is a third-party tool, and links to a dataset. |
| Open Datasets | Yes | We now turn to two real-world datasets, the FICO Challenge dataset (FICO, 2018) and the COMPAS recidivism dataset (Angwin et al., 2016), for evaluating the policy of Section 5. ... The COMPAS dataset, also used by Kilbertus et al. (2020), contains demographics and criminal histories of offenders, a recidivism risk score produced by the COMPAS tool, and an outcome variable indicating whether the offender was re-arrested within two years. Acceptance corresponds to releasing an offender on bail. ... dataset available at https://github.com/ propublica/compas-analysis/blob/ master/compas-scores-two-years.csv. |
| Dataset Splits | No | The paper describes an initial training (exploration) set of size B0 individuals who are always accepted, but it does not specify explicit train/validation/test dataset splits in percentages or absolute counts for the entire dataset used in experiments. |
| Hardware Specification | Yes | The continued use of the homogeneous policy is motivated by two reasons: first, its optimality for finite domains, which might be used to approximate an infinite or continuous domain, and second, the ease of computing the approximation V N(µ, ν) (taking milliseconds on a Mac Book Pro for N = 1000 in Figure 1). |
| Software Dependencies | No | The paper mentions using the 'Vowpal Wabbit (VW) library' but does not specify a version number for it or any other software dependencies. |
| Experiment Setup | Yes | For evaluation, rewards are summed using the discount factor γ = 0.999. The number of rounds T is set to 5/(1 γ) so that the sum of truncated discount weights, P t=T γt, is less than 1% of the total sum P t=0 γt. ... In addition, to provide an initial training (i.e. exploration) set, the first B0 individuals are always accepted and their outcomes are observed. ... The CL algorithm ... is re-implemented for the case of no fairness penalty (λ = 0) and policy updates after every acceptance/observation (N = 1). ... Parameter settings and tuning are discussed in Appendix D.3. |