Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Off-Policy Confidence Sequences
Authors: Nikos Karampatziakis, Paul Mineiro, Aaditya Ramdas
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 7. ExperimentsCode to reproduce all experiment results is available at https://github.com/n17s/mope |
| Researcher Affiliation | Collaboration | 1Microsoft Azure AI 2Microsoft Research 3Carnegie Mellon University. |
| Pseudocode | Yes | Algorithm 1 Solve λ = argmaxλ C ψλ Aλ + λ b ... Algorithm 2 MOPE: Martingale Off-Policy Evaluation |
| Open Source Code | Yes | Code to reproduce all experiment results is available at https://github.com/n17s/mope |
| Open Datasets | Yes | We use the first 1 million samples from the mnist8m dataset |
| Dataset Splits | No | The paper describes using data from the 'mnist8m dataset' and processing it, but it does not explicitly specify dataset splits (e.g., percentages or counts for training, validation, or testing sets). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions training functions like 'linear multinomial logistic regression (MLR)' but does not specify any software libraries or their version numbers that were used. |
| Experiment Setup | Yes | We use the first 1 million samples from the mnist8m dataset which has 10 classes and train the following functions: h using linear multinomial logistic regression (MLR), π again using MLR but now on 1000 random Fourier features (RFF) (Rahimi and Recht, 2007) that approximate a Gaussian kernel machine, and finally q which uses the same RFF represetation as π but instead its i-th output is independently trained to predict whether the input is the i-th class using 10 binary logistic regressions. We used the rest of the data with the following protocol: for each input/label pair (xi, yi), we sample action ai with probability 0.9h(ai; xi) + 0.01 (so that we can safely set wmax = 100), we set ri = 1 if ai = yi, otherwise ri = 0, and record wi and ci. |