Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning
Authors: Kaiwen Wang, Kevin Zhou, Runzhe Wu, Nathan Kallus, Wen Sun
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now compare DISTCB with Square CB [Foster and Rakhlin, 2020] and the state-of-the-art CB method Fast CB [Foster and Krishnamurthy, 2021]... We consider three challenging tasks that are all derived from real-world datasets... Table 1: Avg cost over all episodes and last 100 episodes (lower is better). We report mean (sem) over 10 seeds. Reproducible code is available at https://github.com/kevinzhou497/distcb. |
| Researcher Affiliation | Academia | Kaiwen Wang Kevin Zhou Runzhe Wu Nathan Kallus Wen Sun Cornell University EMAIL |
| Pseudocode | Yes | Algorithm 1 Distributional CB (DISTCB), Algorithm 2 Optimistic Distributional Confidence set Optimization (O-DISCO), Algorithm 3 Pessimistic Distributional Confidence set Optimization (P-DISCO). |
| Open Source Code | Yes | Reproducible code is available at https://github.com/kevinzhou497/distcb. |
| Open Datasets | Yes | King County Housing... Prudential Life Insurance... CIFAR-100 This popular image dataset contains 100 classes... [Krizhevsky, 2009]... [Montoya et al., 2015]... [Vanschoren et al., 2013]... Table 3: Overview of the three datasets and their experimental setups |
| Dataset Splits | No | The paper does not specify dataset splits (e.g., percentages or counts for training, validation, or test sets). It describes the number of episodes and batch sizes for online learning. |
| Hardware Specification | No | The paper does not explicitly describe the hardware (e.g., specific GPU or CPU models, memory specifications, or cloud instances) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch' and 'Wand B (Weights and Biases) library' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | our γ learning rate at each time step t is set to γt = γ0tp where γ0 and p are hyperparameters. We use batch sizes of 32 samples per episode. the King County and Prudential experiments run for 5, 000 episodes while the CIFAR-100 experiment runs for 15, 000. For our regression oracles, we use Res Net18... and a simple 2 hidden-layer neural network... |