Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach
Authors: Riccardo Poiani, Nicole Nobili, Alberto Maria Metelli, Marcello Restelli
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we propose numerical validations that aim at assessing the empirical performance of RIDO. More specifically, we focus on the comparison between our approach, the classical uniformin-the-horizon strategy, and the robust DCS by Poiani et al. [2023]. We report the results across multiple domains, values of budget Λ, and discount factor γ. As a performance index, all experiments measure the empirical variance of the estimator in Equation (1) at the end of the data collection process. |
| Researcher Affiliation | Academia | Riccardo Poiani DEIB, Politecnico di Milano riccardo.poiani@polimi.it Nicole Nobili DEIB, Politecnico di Milano nicole.nobili@mail.polimi.it Alberto Maria Metelli DEIB, Politecnico di Milano albertomaria.metelli@polimi.it Marcello Restelli DEIB, Politecnico di Milano marcello.restelli@polimi.it |
| Pseudocode | Yes | Algorithm 1 Robust and Iterative DCS Optimization (RIDO). |
| Open Source Code | No | The paper mentions using 'pre-trained deep RL agents made publicly available by Raffin [2020]' (a third-party resource) but does not state that the code for RIDO is open-source or provide a link to it. |
| Open Datasets | Yes | In our experiments, we consider the following four domains. We start with the Inverted Pendulum [Brockman et al., 2016], a classic continuous control benchmark, where the agents goal is to swing up a suspended body and keep it in the vertical direction. We, then, continue with the Linear Quadratic Gaussian Regulator [LQG, Curtain, 1997]... Finally, we consider the Ant environment from the Mu Jo Co [Todorov et al., 2012] suite... |
| Dataset Splits | No | The paper describes the number of simulations (100) and runs (100) used for evaluating empirical variance, but it does not specify training, validation, or test data splits for any dataset used. |
| Hardware Specification | No | The paper describes the experimental settings and domains used for evaluation but does not specify any hardware details such as GPU/CPU models or memory. |
| Software Dependencies | No | The paper mentions using specific environments like 'Mu Jo Co' and refers to 'pre-trained deep RL agents made publicly available by Raffin [2020]' (RL Baselines3 Zoo), but it does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | To conclude, we refer the reader to Appendix B for further details on the experiments (e.g., ablations, additional results, experiments with γ = 1, hyper-parameters, visualizations of the resulting DCSs). |