Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Minimax Weight and Q-Function Learning for Off-Policy Evaluation
Authors: Masatoshi Uehara, Jiawei Huang, Nan Jiang
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate the effectiveness of our methods and compare them to baseline algorithms in Cart Pole with function approximation. We compare MWL & MQL to MSWL (Liu et al., 2018, with estimated behavior policy) and Dual DICE (Nachum et al., 2019a). We use neural networks with 2 hidden layers as function approximators for the main function classes for all methods, and use an RBF kernel for the discriminator classes (except for Dual DICE); due to space limit we defer the detailed settings to Appendix E. Figure 1 shows the log MSE of relative errors of different methods, where MQL appears to the best among all methods. |
| Researcher Affiliation | Academia | 1Harvard University, Massachusetts , Boston, USA 2University of Illinois at Urbana-Champaign, Champaign, Illinois, USA. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks (e.g., clearly labeled algorithm sections or code-like formatted procedures). |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described, such as a specific repository link or an explicit code release statement. |
| Open Datasets | Yes | We empirically demonstrate the effectiveness of our methods and compare them to baseline algorithms in Cart Pole with function approximation. (...) To back up this theoretical ο¬nding, we also conduct experiments in the Taxi environment (Dietterich, 2000) following Liu et al. (2018, Section 5) |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using "neural networks" and "RBF kernel" but does not provide specific ancillary software details, such as library or solver names with version numbers. |
| Experiment Setup | No | We use neural networks with 2 hidden layers as function approximators for the main function classes for all methods, and use an RBF kernel for the discriminator classes (except for Dual DICE); due to space limit we defer the detailed settings to Appendix E. |