Efficient Optimistic Exploration in Linear-Quadratic Regulators via Lagrangian Relaxation

Authors: Marc Abeille, Alessandro Lazaric

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conclude with a simple numerical simulation (details in App. J). We compare CECCE with the variance parameter (σ2 in) set as suggested in the original paper and a tuned version where we shrink it by a factor p P 2, and LAGLQ where the confidence interval is set according to (1). Both algorithms receive the same set Θ0 obtained from an initial system identification phase. In Fig. 3 we see that LAGLQ performs better than both the original and tuned versions of CECCE.
Researcher Affiliation Industry 1Criteo AI Lab 2Facebook AI Research.
Pseudocode Yes Figure 2. The DS-OFU algorithm to solve (21).
Open Source Code No The paper does not contain any statement about making the source code available, nor does it provide a link to a code repository.
Open Datasets No The paper describes a “simple numerical simulation” and states “Both algorithms receive the same set Θ0 obtained from an initial system identification phase.” but does not provide any specific access information (link, DOI, citation) to a publicly available dataset.
Dataset Splits No The paper describes a “simple numerical simulation” but does not provide specific details on training, validation, or test dataset splits.
Hardware Specification No The paper describes a “simple numerical simulation” but does not provide any specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper describes its methods and a numerical simulation but does not list any specific software components with version numbers required for reproduction.
Experiment Setup No The paper states “We conclude with a simple numerical simulation (details in App. J)” but the main text itself does not contain specific experimental setup details such as concrete hyperparameter values or training configurations.