Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Robust Logistic Regression and Classification
Authors: Jiashi Feng, Huan Xu, Shie Mannor, Shuicheng Yan
NeurIPS 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct simulations to verify the robustness of Ro LR along with its applicability for robust binary classification. We compare Ro LR with standard logistic regression which estimates the model parameter through maximizing the log-likelihood function. |
| Researcher Affiliation | Academia | Jiashi Feng EECS Department & ICSI UC Berkeley EMAIL Huan Xu ME Department National University of Singapore EMAIL Shie Mannor EE Department Technion EMAIL Shuicheng Yan ECE Department National University of Singapore EMAIL |
| Pseudocode | Yes | Algorithm 1 Ro LR Input: Contaminated training samples {(x1, y1), . . . , (xn+n1, yn+n1)}, an upper bound on the number of outliers n1, number of inliers n and sample dimension p. Initialization: Set T = 4 p log p/n + log n/n. Preprocessing: Remove samples (xi, yi) whose magnitude satisfies xi T. Solve the following linear programming problem (see Eqn. (3)): ˆβ = arg max β Bp 2 i=1 [y β, x ](i). Output: ˆβ. |
| Open Source Code | No | The paper does not provide any concrete statement or link regarding the availability of source code for the described methodology. |
| Open Datasets | No | We randomly generated the samples according to the model in Eqn. (1) for the logistic regression problem. |
| Dataset Splits | No | The paper mentions 'training samples' and 'additional nt = 1, 000 authentic samples are generated for testing' but does not specify explicit training/validation/test dataset splits or their percentages/counts for the main dataset used for training/validation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models, or memory specifications. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers needed to replicate the experiment. |
| Experiment Setup | Yes | In particular, we first sample the model parameter β N(0, Ip) and normalize it as β := β/ β 2. Here p is the dimension of the parameter, which is also the dimension of samples. The samples are drawn i.i.d. from xi N(0, Σx) with Σx = Ip, and the Gaussian noise is sampled as vi N(0, σe). Then, the sample label yi is generated according to P{yi = +1} = τ( β, xi +vi) for the LR case. For the classification case, the sample labels are generated by yi = sign( β, xi +vi) and additional nt = 1, 000 authentic samples are generated for testing. The entries of outliers xo are i.i.d. random variables from uniform distribution [ σo, σo] with σo = 10. ...We fix n = 1, 000 and the λ varies from 0 to 1.2, with a step of 0.1. |