Truncating Trajectories in Monte Carlo Reinforcement Learning

Authors: Riccardo Poiani, Alberto Maria Metelli, Marcello Restelli

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we conduct a numerical comparison between our algorithm and POIS: the results are consistent with our theory and show that an appropriate truncation of the trajectories can succeed in improving performance.
Researcher Affiliation Academia Riccardo Poiani 1 Alberto Maria Metelli 1 Marcello Restelli 1 1Diparimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy. Correspondence to: Riccardo Poiani <riccardo.poiani@polimi.it>.
Pseudocode Yes The pseudo-code for our algorithm, TT-POIS, can be found in Algorithm 1. As one can notice, by replacing m with the uniform-in-the-horizon DCS, we recover the original pseudo-code of POIS (Metelli et al., 2018).
Open Source Code No Further details can be found in the code base we provide.
Open Datasets Yes Dam Control In our first experimental domain, we consider a water resource management scenario (Castelletti et al., 2010; Parisi et al., 2014; Tirinzoni et al., 2018; Liotet et al., 2022). Reacher In the second experiment, we consider the standard continuous control problem of a two-jointed robot arm (Todorov et al., 2012). Multi-Echelon Supply Chain Finally, we consider the problem of managing the complex inventory of a 4 stage supply chain (Hubbs et al., 2020). And specifically for Supply Chain: we rely on their publicly available repository for our experiments.
Dataset Splits No The paper does not explicitly provide training/validation/test splits. It refers to "training process" and "training iterations" but no details on data partitioning for validation or test sets.
Hardware Specification Yes We have run the experiments using 88 Intel(R) Xeon(R) CPU E7-8880 v4 @ 2.20GHz cpus and 94 GB of RAM.
Software Dependencies No The paper mentions "Neural Network Size", "Weight Initialization", "Activation Function", "Confidence δ", "Number of offline iterations", and "RMIN-MAX" in tables, but does not specify software components (e.g., Python, PyTorch, TensorFlow) with version numbers. Importance Weight Clipping is also mentioned.
Experiment Setup Yes Table 1. Corridor Sparse Rewards Hyper-parameters for POIS and TT-POIS, Table 2. Corridor Dense Rewards Hyper-parameters for POIS and TT-POIS, Table 3. Dam Hyper-parameters for POIS and TT-POIS, Table 4. Supply Chain Hyper-parameters for POIS and TT-POIS, Table 5. Reacher Hyper-parameters for POIS and TT-POIS list specific hyperparameters like Neural Network Size [64, 32], Confidence δ 0.9, Number of offline iterations 10.