reproducibilityindex.ai

Truncating Trajectories in Monte Carlo Reinforcement Learning

Authors: Riccardo Poiani, Alberto Maria Metelli, Marcello Restelli

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we conduct a numerical comparison between our algorithm and POIS: the results are consistent with our theory and show that an appropriate truncation of the trajectories can succeed in improving performance.
Researcher Affiliation	Academia	Riccardo Poiani 1 Alberto Maria Metelli 1 Marcello Restelli 1 1Diparimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy. Correspondence to: Riccardo Poiani <riccardo.poiani@polimi.it>.
Pseudocode	Yes	The pseudo-code for our algorithm, TT-POIS, can be found in Algorithm 1. As one can notice, by replacing m with the uniform-in-the-horizon DCS, we recover the original pseudo-code of POIS (Metelli et al., 2018).
Open Source Code	No	Further details can be found in the code base we provide.
Open Datasets	Yes	Dam Control In our ﬁrst experimental domain, we consider a water resource management scenario (Castelletti et al., 2010; Parisi et al., 2014; Tirinzoni et al., 2018; Liotet et al., 2022). Reacher In the second experiment, we consider the standard continuous control problem of a two-jointed robot arm (Todorov et al., 2012). Multi-Echelon Supply Chain Finally, we consider the problem of managing the complex inventory of a 4 stage supply chain (Hubbs et al., 2020). And specifically for Supply Chain: we rely on their publicly available repository for our experiments.
Dataset Splits	No	The paper does not explicitly provide training/validation/test splits. It refers to "training process" and "training iterations" but no details on data partitioning for validation or test sets.
Hardware Specification	Yes	We have run the experiments using 88 Intel(R) Xeon(R) CPU E7-8880 v4 @ 2.20GHz cpus and 94 GB of RAM.
Software Dependencies	No	The paper mentions "Neural Network Size", "Weight Initialization", "Activation Function", "Confidence δ", "Number of ofﬂine iterations", and "RMIN-MAX" in tables, but does not specify software components (e.g., Python, PyTorch, TensorFlow) with version numbers. Importance Weight Clipping is also mentioned.
Experiment Setup	Yes	Table 1. Corridor Sparse Rewards Hyper-parameters for POIS and TT-POIS, Table 2. Corridor Dense Rewards Hyper-parameters for POIS and TT-POIS, Table 3. Dam Hyper-parameters for POIS and TT-POIS, Table 4. Supply Chain Hyper-parameters for POIS and TT-POIS, Table 5. Reacher Hyper-parameters for POIS and TT-POIS list specific hyperparameters like Neural Network Size [64, 32], Confidence δ 0.9, Number of ofﬂine iterations 10.