Model Selection in Batch Policy Optimization

Authors: Jonathan Lee, George Tucker, Ofir Nachum, Bo Dai

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conclude with experiments demonstrating the efficacy of these algorithms.
Researcher Affiliation Collaboration 1Department of Computer Science, Stanford University, USA 2Google Research, Mountain View, USA.
Pseudocode Yes Algorithm 1 Pessimistic Linear Learner; Algorithm 2 Complexity-Coverage Selection; Algorithm 3 SLOPE Method
Open Source Code No The paper does not contain any explicit statement about providing open-source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets No To complement our primarily theoretical results, we study the utility of the above model selection algorithms in synthetic experiments and empirically compare them... For both the batch dataset and the test set, noise was artificially generated on rewards by sampling from a standard normal distribution N(0, 1).
Dataset Splits No The paper mentions generating a 'batch dataset' and a 'test set' but does not specify a separate validation split or the percentages of data used for training, validation, and testing.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions that random quantities were generated by sampling multivariate normal distributions, but it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific libraries).
Experiment Setup Yes For the algorithms, penalization terms (i.e. the estimation error) typically depends on constants being chosen sufficiently large to ensure a confidence interval is valid. However, choosing large values in practice can lead to unnecessarily poor convergence. We found that multiplying by C = 0.1 yielded good performance in most settings.