Model Selection in Batch Policy Optimization
Authors: Jonathan Lee, George Tucker, Ofir Nachum, Bo Dai
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conclude with experiments demonstrating the efficacy of these algorithms. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, Stanford University, USA 2Google Research, Mountain View, USA. |
| Pseudocode | Yes | Algorithm 1 Pessimistic Linear Learner; Algorithm 2 Complexity-Coverage Selection; Algorithm 3 SLOPE Method |
| Open Source Code | No | The paper does not contain any explicit statement about providing open-source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | No | To complement our primarily theoretical results, we study the utility of the above model selection algorithms in synthetic experiments and empirically compare them... For both the batch dataset and the test set, noise was artificially generated on rewards by sampling from a standard normal distribution N(0, 1). |
| Dataset Splits | No | The paper mentions generating a 'batch dataset' and a 'test set' but does not specify a separate validation split or the percentages of data used for training, validation, and testing. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions that random quantities were generated by sampling multivariate normal distributions, but it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific libraries). |
| Experiment Setup | Yes | For the algorithms, penalization terms (i.e. the estimation error) typically depends on constants being chosen sufficiently large to ensure a confidence interval is valid. However, choosing large values in practice can lead to unnecessarily poor convergence. We found that multiplying by C = 0.1 yielded good performance in most settings. |