reproducibilityindex.ai

Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach

Authors: Riccardo Poiani, Nicole Nobili, Alberto Maria Metelli, Marcello Restelli

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we propose numerical validations that aim at assessing the empirical performance of RIDO. More speciﬁcally, we focus on the comparison between our approach, the classical uniformin-the-horizon strategy, and the robust DCS by Poiani et al. [2023]. We report the results across multiple domains, values of budget Λ, and discount factor γ. As a performance index, all experiments measure the empirical variance of the estimator in Equation (1) at the end of the data collection process.
Researcher Affiliation	Academia	Riccardo Poiani DEIB, Politecnico di Milano riccardo.poiani@polimi.it Nicole Nobili DEIB, Politecnico di Milano nicole.nobili@mail.polimi.it Alberto Maria Metelli DEIB, Politecnico di Milano albertomaria.metelli@polimi.it Marcello Restelli DEIB, Politecnico di Milano marcello.restelli@polimi.it
Pseudocode	Yes	Algorithm 1 Robust and Iterative DCS Optimization (RIDO).
Open Source Code	No	The paper mentions using 'pre-trained deep RL agents made publicly available by Rafﬁn [2020]' (a third-party resource) but does not state that the code for RIDO is open-source or provide a link to it.
Open Datasets	Yes	In our experiments, we consider the following four domains. We start with the Inverted Pendulum [Brockman et al., 2016], a classic continuous control benchmark, where the agents goal is to swing up a suspended body and keep it in the vertical direction. We, then, continue with the Linear Quadratic Gaussian Regulator [LQG, Curtain, 1997]... Finally, we consider the Ant environment from the Mu Jo Co [Todorov et al., 2012] suite...
Dataset Splits	No	The paper describes the number of simulations (100) and runs (100) used for evaluating empirical variance, but it does not specify training, validation, or test data splits for any dataset used.
Hardware Specification	No	The paper describes the experimental settings and domains used for evaluation but does not specify any hardware details such as GPU/CPU models or memory.
Software Dependencies	No	The paper mentions using specific environments like 'Mu Jo Co' and refers to 'pre-trained deep RL agents made publicly available by Rafﬁn [2020]' (RL Baselines3 Zoo), but it does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	To conclude, we refer the reader to Appendix B for further details on the experiments (e.g., ablations, additional results, experiments with γ = 1, hyper-parameters, visualizations of the resulting DCSs).