reproducibilityindex.ai

Fair Performance Metric Elicitation

Authors: Gaurush Hiranandani, Harikrishna Narasimhan, Sanmi Koyejo

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We ﬁrst empirically validate the FPME procedure and recovery guarantees of Section 5. Recall that there exists a sphere Sρ R1 Rm as long as there is a non-trivial classiﬁcation signal within each group (Assumption 2). Thus for experiments, we assume access to a feasible sphere Sρ with ρ = 0.2. We randomly generate 100 oracle metrics each for k, m {2, 3, 4, 5} parametrized by {a, B, λ}. This speciﬁes the query outputs by the oracle for each metric in Algorithm 1. We then use Algorithm 1 with tolerance ϵ = 10 3 to elicit corresponding metrics parametrized by {ˆa, ˆB, ˆλ}. Algorithm 1 makes 1 + 2M subroutine calls to LPME procedure and 1 call to Algorithm 4. LPME subroutine requires exactly 16(q 1) log(π/2ϵ) queries, where we use 4 queries to shrink the interval in the binary search loop and ﬁx 4 cycles for the coordinate-wise search. Also, Algorithm 4 requires 4 log(1/ϵ) queries. In Figure 4, we report the mean of the ℓ2-norm between the oracle s metric and the elicited metric. Clearly, we elicit metrics that are close to the true metrics. Moreover, this holds true across a range of m and k values demonstrating the robustness of the proposed approach.
Researcher Affiliation	Collaboration	Gaurush Hiranandani UIUC gaurush2@illinois.edu Harikrishna Narasimhan Google Research USA hnarasimhan@google.com Oluwasanmi Koyejo UIUC & Google Research Accra sanmi@illinois.edu
Pseudocode	Yes	Algorithm 1: FPM Elicitation Input: Query spaces Sρ, S+ ϱ , search tolerance ϵ > 0, and oracle Ω 1: ˆa LPME(Sρ, ϵ, Ωclass) 2: If m == 2 3: f LPME(Sρ, ϵ, Ωviol 1 ) 4: f LPME(Sρ, ϵ, Ωviol 2 ) 5: ˆb12 normalized solution from (11) 6: Else Let L 7: For σ M do 8: f σ LPME(Sρ, ϵ, Ωviol σ,1) 9: f σ LPME(Sρ, ϵ, Ωviol σ,k) 10: Let ℓσ be Eq. (13), extend L L {ℓσ} 11: ˆB normalized solution from (14) using L 12: ˆλ Algorithm 4 (S+ ϱ , ϵ, Ωtrade-off) Output: ˆa, ˆB, ˆλ
Open Source Code	No	The paper does not provide any statement or link indicating the availability of open-source code for the methodology described.
Open Datasets	No	The paper mentions generating synthetic 'oracle metrics' for experiments and refers to a 'dataset' in its theoretical setup, but does not provide concrete access information (link, citation with authors/year) for any publicly available or open dataset used for the main empirical validation.
Dataset Splits	No	The paper mentions using 'randomly generated' data for experiments but does not provide specific details on training, validation, or test dataset splits, such as percentages, sample counts, or citations to predefined splits.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup	Yes	We then use Algorithm 1 with tolerance ϵ = 10 3 to elicit corresponding metrics parametrized by {ˆa, ˆB, ˆλ}. Algorithm 1 makes 1 + 2M subroutine calls to LPME procedure and 1 call to Algorithm 4. LPME subroutine requires exactly 16(q 1) log(π/2ϵ) queries, where we use 4 queries to shrink the interval in the binary search loop and ﬁx 4 cycles for the coordinate-wise search. Also, Algorithm 4 requires 4 log(1/ϵ) queries.