Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Ladder: A Reliable Leaderboard for Machine Learning Competitions

Authors: Avrim Blum, Moritz Hardt

ICML 2015 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental we conduct two opposing experiments. The first is an adversarial yet practical attack on the leaderboard... In a second experiment, we evaluate our algorithm on real submission files from a Kaggle competition.
Researcher Affiliation Academia Moritz Hardt EMAIL Avrim Blum EMAIL Carnegie Mellon University
Pseudocode Yes Algorithm 1 Ladder mechanism Input: Data set S, step size η > 0 Assign initial estimate R0 . for round t = 1, 2, . . . do Receive function ft : X Y if RS(ft) < Rt 1 η then Assign Rt [RS(ft)]η. else Assign Rt Rt 1. end if end for
Open Source Code Yes Our code is available at https://github.com/mrtzh/Ladder.jl.
Open Datasets Yes To demonstrate the utility of the Ladder mechanism we turn to real submission data from Kaggle s Photo Quality Prediction challenge2. The holdout set contained 12000 samples of which Kaggle used 8400 for the private leaderboard and 3600 for the public leaderboard.
Dataset Splits Yes The holdout set contained 12000 samples of which Kaggle used 8400 for the private leaderboard and 3600 for the public leaderboard.
Hardware Specification No The paper does not specify any particular hardware (GPU/CPU models, memory, etc.) used for running experiments.
Software Dependencies No The paper does not provide specific software names with version numbers, such as libraries or frameworks.
Experiment Setup No The paper defines the parameters of the Ladder algorithm (e.g., step size η) and attack parameters (N, n, rounding error), but does not provide typical machine learning training hyperparameters like learning rates, batch sizes, or optimizer settings.