Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Model Selection in Batch Policy Optimization
Authors: Jonathan Lee, George Tucker, Ofir Nachum, Bo Dai
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conclude with experiments demonstrating the efficacy of these algorithms. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, Stanford University, USA 2Google Research, Mountain View, USA. |
| Pseudocode | Yes | Algorithm 1 Pessimistic Linear Learner; Algorithm 2 Complexity-Coverage Selection; Algorithm 3 SLOPE Method |
| Open Source Code | No | The paper does not contain any explicit statement about providing open-source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | No | To complement our primarily theoretical results, we study the utility of the above model selection algorithms in synthetic experiments and empirically compare them... For both the batch dataset and the test set, noise was artificially generated on rewards by sampling from a standard normal distribution N(0, 1). |
| Dataset Splits | No | The paper mentions generating a 'batch dataset' and a 'test set' but does not specify a separate validation split or the percentages of data used for training, validation, and testing. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions that random quantities were generated by sampling multivariate normal distributions, but it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific libraries). |
| Experiment Setup | Yes | For the algorithms, penalization terms (i.e. the estimation error) typically depends on constants being chosen sufficiently large to ensure a confidence interval is valid. However, choosing large values in practice can lead to unnecessarily poor convergence. We found that multiplying by C = 0.1 yielded good performance in most settings. |