Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Additive Approximations in High Dimensional Nonparametric Regression via the SALSA
Authors: Kirthevasan Kandasamy, Yaoliang Yu
ICML 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Via a comparison on 15 real datasets, we show that our method is competitive against 21 other alternatives. |
| Researcher Affiliation | Academia | Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA |
| Pseudocode | No | The paper describes the algorithm mathematically and in text but does not present a formal pseudocode block. |
| Open Source Code | Yes | Our software and datasets are available at github.com/kirthevasank/salsa. Our implementation of locally polynomial regression is also released as part of this paper and is made available at github.com/kirthevasank/local-poly-reg. |
| Open Datasets | Yes | The datasets were taken from the UCI repository, Bristol Multilevel Modeling and the following sources: (Guillame-Bert et al., 2014; Just et al., 2010; Paschou, 2007; Tegmark et al, 2006; Tu, 2012; Wehbe et al., 2014). |
| Dataset Splits | Yes | For a given d we solve (1) for different λ and pick the best one via cross validation. To choose the optimal d we cross validate on d. |
| Hardware Specification | No | The paper does not provide specific hardware details such as CPU/GPU models, memory, or cloud instance types used for experiments. |
| Software Dependencies | No | We used software from (Chang & Lin, 2011; Hara & Chellappa, 2013; Jakabsons, 2015; Lin & Zhang, 2006; Rasmussen & Williams, 2006) or from Matlab. |
| Experiment Setup | Yes | In our experiments we set each ki to be a Gaussian kernel ki(xi, x i) = σY exp( (xi x i)2/2h2 i ) with bandwidth hi = cσin 1/5. Here σi is the standard deviation of the ith covariate and σY is the standard deviation of Y . The choice of bandwidth was inspired by several other kernel methods which use bandwidths on the order σin 1/5 (Ravikumar et al., 2009; Tsybakov, 2008). The constant c was hand tuned we found that performance was robust to choices between 5 and 60. In our experiments we use c = 20. |