Truncating Trajectories in Monte Carlo Reinforcement Learning
Authors: Riccardo Poiani, Alberto Maria Metelli, Marcello Restelli
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we conduct a numerical comparison between our algorithm and POIS: the results are consistent with our theory and show that an appropriate truncation of the trajectories can succeed in improving performance. |
| Researcher Affiliation | Academia | Riccardo Poiani 1 Alberto Maria Metelli 1 Marcello Restelli 1 1Diparimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy. Correspondence to: Riccardo Poiani <riccardo.poiani@polimi.it>. |
| Pseudocode | Yes | The pseudo-code for our algorithm, TT-POIS, can be found in Algorithm 1. As one can notice, by replacing m with the uniform-in-the-horizon DCS, we recover the original pseudo-code of POIS (Metelli et al., 2018). |
| Open Source Code | No | Further details can be found in the code base we provide. |
| Open Datasets | Yes | Dam Control In our first experimental domain, we consider a water resource management scenario (Castelletti et al., 2010; Parisi et al., 2014; Tirinzoni et al., 2018; Liotet et al., 2022). Reacher In the second experiment, we consider the standard continuous control problem of a two-jointed robot arm (Todorov et al., 2012). Multi-Echelon Supply Chain Finally, we consider the problem of managing the complex inventory of a 4 stage supply chain (Hubbs et al., 2020). And specifically for Supply Chain: we rely on their publicly available repository for our experiments. |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test splits. It refers to "training process" and "training iterations" but no details on data partitioning for validation or test sets. |
| Hardware Specification | Yes | We have run the experiments using 88 Intel(R) Xeon(R) CPU E7-8880 v4 @ 2.20GHz cpus and 94 GB of RAM. |
| Software Dependencies | No | The paper mentions "Neural Network Size", "Weight Initialization", "Activation Function", "Confidence δ", "Number of offline iterations", and "RMIN-MAX" in tables, but does not specify software components (e.g., Python, PyTorch, TensorFlow) with version numbers. Importance Weight Clipping is also mentioned. |
| Experiment Setup | Yes | Table 1. Corridor Sparse Rewards Hyper-parameters for POIS and TT-POIS, Table 2. Corridor Dense Rewards Hyper-parameters for POIS and TT-POIS, Table 3. Dam Hyper-parameters for POIS and TT-POIS, Table 4. Supply Chain Hyper-parameters for POIS and TT-POIS, Table 5. Reacher Hyper-parameters for POIS and TT-POIS list specific hyperparameters like Neural Network Size [64, 32], Confidence δ 0.9, Number of offline iterations 10. |