Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

How to Learn a Star: Binary Classification with Starshaped Polyhedral Sets

Authors: Marie-Charlotte Brandenburg, Katharina Jochemko

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted small-scale experiments where we tested Algorithm 1, implemented in Sage Math 10.5 [The Sage Developers, 2024], on two-dimensional synthetic data. The computations were done on a Mac Book Pro equipped with an M2 Pro chip and 32 GB of RAM. For comparison, we also applied several standard binary classiﬁcation methods leading to convex optimization problems on the same dataset, as well as a Re LU neural network. The computation running time ranged from few seconds to one hour.
Researcher Affiliation	Academia	Marie-Charlotte Brandenburg Ruhr Universität Bochum Universitätsstr. 150, 44801 Bochum, Germany EMAIL Katharina Jochemko KTH Royal Institute of Technology 100 44 Stockholm, Sweden EMAIL
Pseudocode	Yes	Algorithm 1 Computation of the maximum likelihood estimator Input: , X = {(x(i), y(i))}m i=1, λ Output: a 1: determine AX 2: solve argmaxa>0 Pm i=1 y(i) log 1 e λ(AXa)i + (1 y(i))( λ)(AXa)i
Open Source Code	No	Answer: [No] Justiﬁcation: The small experiments in this article are not central to the contribution and easily reproducible with the description provided in the article.
Open Datasets	No	Figure 3a illustrates 500 data points sampled from a given star-shaped region (in green) deﬁned on eight rays. The data was generated as follows: we randomly selected the xand y-coordinates of all points from the interval [ 1, 1] using a uniform distribution and discarded any resulting points (x, y) lying outside the unit circle. This was done to achieve a near rotational symmetry of the data set. For each remaining point, we then checked whether it lies inside or outside the star-shaped region. The corresponding label was assigned accordingly, with a 90% probability of being correct.
Dataset Splits	No	The data was generated as follows: we randomly selected the xand y-coordinates of all points from the interval [ 1, 1] using a uniform distribution and discarded any resulting points (x, y) lying outside the unit circle. This was done to achieve a near rotational symmetry of the data set. For each remaining point, we then checked whether it lies inside or outside the star-shaped region. The corresponding label was assigned accordingly, with a 90% probability of being correct.
Hardware Specification	Yes	The computations were done on a Mac Book Pro equipped with an M2 Pro chip and 32 GB of RAM.
Software Dependencies	Yes	We conducted small-scale experiments where we tested Algorithm 1, implemented in Sage Math 10.5 [The Sage Developers, 2024], on two-dimensional synthetic data.
Experiment Setup	Yes	Running Algorithm 1 on the synthetic data set, the optimal value of the regularization parameter was found to be approximately λ = 0.83, yielding an accuracy of 0.852. The resulting optimal star classiﬁer is shown in Figure 10a. For comparison, we also tested standard implementations of SVMs (with linear, polynomial, RBF, and sigmoid kernels), logistic regression, and a Re LU neural network with two hidden layers of sizes 5 and 2, respectively.