reproducibilityindex.ai

Distribution-Free Statistical Dispersion Control for Societal Applications

Authors: Zhun Deng, Thomas Zollo, Jake Snell, Toniann Pitassi, Richard Zemel

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our methods are verified through experiments in toxic comment detection, medical imaging, and film recommendation.
Researcher Affiliation	Academia	Zhun Deng zhun.d@columbia.edu Columbia University Thomas P. Zollo tpz2105@columbia.edu Columbia University Jake C. Snell js2523@princeton.edu Princeton University Toniann Pitassi toni@cs.columbia.edu Columbia University Richard Zemel zemel@cs.columbia.edu Columbia University
Pseudocode	No	No pseudocode or algorithm blocks are included in the paper.
Open Source Code	No	Our code will be released publicly upon the publication of this article.
Open Datasets	Yes	Using the Civil Comments dataset [6]... Using a model trained on the train split of the Rx Rx1 dataset [31]... Using the Movie Lens dataset [12]...
Dataset Splits	Yes	We use 100 and 200 samples from each group... We randomly sample 100,000 test points for calculating the empirical values in Table 1, and draw our validation points from the remaining data. ... We randomly sample 2500 items for use in validation (bounding and model selection)... We randomly sample 1500 users for validation...
Hardware Specification	No	No specific hardware details (e.g., GPU models, CPU types) used for running the experiments are mentioned in the paper.
Software Dependencies	No	The paper mentions software libraries like "Detoxify [10]", "python library released by [18]", and "Light FM [9]", but does not provide specific version numbers for these or for the programming language used.
Experiment Setup	Yes	Our set of hypotheses are a toxicity model combined with a Platt scaler [22], where the model is fixed and we vary the scaling parameter in the range [0.25, 2]... Training is performed in two stages, where the network is first trained to approximate a Berk-Jones bound, and then optimized for some specified objective O. In both stages of training we aim to push the training error to zero or as close as possible (i.e. overfit ). The model is first trained for 100,000 epochs to output the Berk-Jones bound using a mean-squared error loss. Then optimization on O is performed for a maximum of 10,000 epochs, and validation is performed every 25 epochs, where we choose the best model according to the bound on O. Both stages of optimization use the Adam optimizer with a learning rate 0.00005, and for the second stage the constraint weight is set to λ = 0.00005.