Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Risk Bounds For Distributional Regression

Authors: Carlos Misael Madrid Padilla, OSCAR HERNAN MADRID PADILLA, Sabyasachi Chatterjee

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments on both simulated and real data validate the theoretical contributions, demonstrating their practical effectiveness. ... 4 Simulated data analysis ... 5 Real data application
Researcher Affiliation Academia Carlos Misael Madrid Padilla Department of Statistics and Data Science Washington University in St Louis St Louis, MO 63130 EMAIL; Oscar Hernan Madrid Padilla Department of Statistics University of California, Los Angeles Los Angele, CA 90095 EMAIL; Sabyasachi Chatterjee Department of Statistics University of Illinois at Urbana-Champaign Champaign, IL 61820 EMAIL
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks. The methods are described using mathematical formulations and descriptive text.
Open Source Code Yes The implementation of all experiments is available at https://github.com/cmadridp/UnifDR.
Open Datasets Yes We analyze the 2015 Chicago crime dataset, available at https://data.gov/open-gov/ ... We evaluate the effectiveness of the proposed Unif DR method by analyzing the 1990 California housing dataset... publicly available via the Carnegie Mellon Stat Lib repository at http://lib.stat.cmu.edu/datasets/, as well as the https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html portal. ... ozone concentration data collected from the Environmental Protection Agency (EPA) Regional dataset. ... The data are publicly accessible via the EPA s Air Data portal at https://aqs.epa.gov/aqsweb/airdata/download_files.html
Dataset Splits Yes Each dataset is randomly split into 75% training and 25% test sets. Competing models undergo 5-fold cross-validation on the training data for hyperparameter tuning, with performance assessed on the test set.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory amounts, or processor types used for running its experiments.
Software Dependencies No The isotonic method introduced in Section 3.2.1 is implemented in R using the pool adjacent violators algorithm (PAVA) from Robertson [51]. For this approach there are no direct competitors in the distributional regression problem. The Trend Filtering estimator in Section 3.2.2 is implemented using the trendfilter function from the glmgen package in R, and we compare it with additive smoothing splines (Add SS) via the smooth.spline function in R. For the Dense Re LU Networks method in Section 3.3.2, we use a fully connected feedforward architecture with an input layer, two hidden layers (64 units each), and an output layer. The network is implemented in Python and trained using the Adam optimizer with a learning rate of 0.001.
Experiment Setup Yes For the Dense Re LU Networks method in Section 3.3.2, we use a fully connected feedforward architecture with an input layer, two hidden layers (64 units each), and an output layer. The network is implemented in Python and trained using the Adam optimizer with a learning rate of 0.001. ... The Dense Re LU network approach employs a fully connected feedforward architecture with five hidden layers of 64 neurons each, using Re LU activations. The model is trained using the Adam optimizer with a learning rate of 0.001 over 1,000 epochs, minimizing the Binary Cross-Entropy (BCE) loss function for improved CDF estimation. ... The Dense Re LU neural network consists of three hidden layers, each containing 30 neurons followed by a Re LU activation function. ... The Dense Re LU network consists of two hidden layers, each containing 100 neurons followed by a Re LU activation function.