Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Calibration and Consistency of Adversarial Surrogate Losses
Authors: Pranjal Awasthi, Natalie Frank, Anqi Mao, Mehryar Mohri, Yutao Zhong
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also report a series of empirical results which show that many H-calibrated surrogate losses are indeed not H-consistent, and validate our theoretical assumptions. In Section 6, we further report a series of empirical results on simulated data, which show that many H-calibrated surrogate losses are indeed not H-consistent, and justify our conditions for consistency. |
| Researcher Affiliation | Collaboration | Pranjal Awasthi Google Research New York, NY 10011 EMAIL Natalie S. Frank Courant Institute New York, NY 10012 EMAIL Anqi Mao Courant Institute New York, NY 10012 EMAIL Mehryar Mohri Google Research & Courant Institute New York, NY 10011 EMAIL Yutao Zhong Courant Institute New York, NY 10012 EMAIL |
| Pseudocode | No | The paper describes concepts and proofs using mathematical notation and natural language, but it does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statements about releasing code, nor does it provide any links to source code repositories. |
| Open Datasets | No | The paper states: 'We generate data points x R2 on the unit circle and consider H to be linear models Hlin.' and 'We generate data points x from the uniform distribution on the unit circle.' This indicates simulated or custom-generated data for which no public access information is provided. |
| Dataset Splits | No | The paper states: 'All risks are approximated by their empirical counterparts computed over 107 i.i.d. samples.' This indicates a single set of samples for approximating risks, not distinct train/validation/test splits for model training or evaluation. |
| Hardware Specification | No | The paper describes experiments in Section 6 but does not provide any specific details about the hardware used, such as exact GPU/CPU models or processor types. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific solver versions) that would be needed to replicate the experiments. |
| Experiment Setup | Yes | All risks are approximated by their empirical counterparts computed over 107 i.i.d. samples. Set the label of a point x as follows: if θ (-π/2,π), then y = 1 with probability 3/4 and y = -1 with probability 1/4; if θ (0, π/2) or (3π/2, 2π), then y = 1; if θ (π, 3π/2), then y = -1. Set γ = 2/2. We choose γ = 0.1 and set w = (1,0)T. |