reproducibilityindex.ai

Off-Policy Evaluation via Off-Policy Classification

Authors: Alexander Irpan, Kanishka Rao, Konstantinos Bousmalis, Chris Harris, Julian Ibarz, Sergey Levine

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally show that this metric outperforms baselines on a number of tasks.
Researcher Affiliation	Collaboration	1Google Brain, Mountain View, USA 2Deep Mind, London, UK 3University of California Berkeley, Berkeley, USA
Pseudocode	Yes	Pseudocode is in Appendix B.
Open Source Code	No	The paper states "Code for the binary tree environment is available at https://bit.ly/2Qx6TJ7.", but does not explicitly state that the code for the main methodology (OPC/Soft OPC) is open-source or provided.
Open Datasets	No	The paper describes collecting its own datasets for the robotic grasping task ('data collected by a hand-crafted policy... with two different datasets') and for the Binary Tree and Pong experiments ('generated 1,000 episodes from a uniformly random policy', 'generated 30 episodes from each'), but does not provide concrete access information (link, DOI, or explicit statement of public availability) for these collected datasets.
Dataset Splits	Yes	For the validation dataset D was collected by generating 1,000 episodes from a uniformly random policy (Binary Tree). For the validation dataset we used 38 Q-functions that were partially-trained with DDQN and generated 30 episodes from each, for a total of 1140 episodes (Pong). ... based on held-out validation sets of 50, 000 episodes from the training environment and 10, 000 episodes from the test one (Robotic Grasping).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions algorithms like DQN, DDQN, and QT-Opt, but does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with their versions) used in the experiments.
Experiment Setup	Yes	We learned Q-functions using DQN [25] and DDQN [38], varying hyperparameters such as the learning rate, the discount factor γ, and the batch size, as discussed in detail in Appendix E.2. Appendix E.2 provides details such as learning rate of 0.0000625, a discount factor γ of 0.99, and a batch size of 32.