Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach
Authors: Riccardo Poiani, Nicole Nobili, Alberto Maria Metelli, Marcello Restelli
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we propose numerical validations that aim at assessing the empirical performance of RIDO. More specifically, we focus on the comparison between our approach, the classical uniformin-the-horizon strategy, and the robust DCS by Poiani et al. [2023]. We report the results across multiple domains, values of budget Λ, and discount factor γ. As a performance index, all experiments measure the empirical variance of the estimator in Equation (1) at the end of the data collection process. |
| Researcher Affiliation | Academia | Riccardo Poiani DEIB, Politecnico di Milano EMAIL Nicole Nobili DEIB, Politecnico di Milano EMAIL Alberto Maria Metelli DEIB, Politecnico di Milano EMAIL Marcello Restelli DEIB, Politecnico di Milano EMAIL |
| Pseudocode | Yes | Algorithm 1 Robust and Iterative DCS Optimization (RIDO). |
| Open Source Code | No | The paper mentions using 'pre-trained deep RL agents made publicly available by Raffin [2020]' (a third-party resource) but does not state that the code for RIDO is open-source or provide a link to it. |
| Open Datasets | Yes | In our experiments, we consider the following four domains. We start with the Inverted Pendulum [Brockman et al., 2016], a classic continuous control benchmark, where the agents goal is to swing up a suspended body and keep it in the vertical direction. We, then, continue with the Linear Quadratic Gaussian Regulator [LQG, Curtain, 1997]... Finally, we consider the Ant environment from the Mu Jo Co [Todorov et al., 2012] suite... |
| Dataset Splits | No | The paper describes the number of simulations (100) and runs (100) used for evaluating empirical variance, but it does not specify training, validation, or test data splits for any dataset used. |
| Hardware Specification | No | The paper describes the experimental settings and domains used for evaluation but does not specify any hardware details such as GPU/CPU models or memory. |
| Software Dependencies | No | The paper mentions using specific environments like 'Mu Jo Co' and refers to 'pre-trained deep RL agents made publicly available by Raffin [2020]' (RL Baselines3 Zoo), but it does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | To conclude, we refer the reader to Appendix B for further details on the experiments (e.g., ablations, additional results, experiments with γ = 1, hyper-parameters, visualizations of the resulting DCSs). |