Lifelong Hyper-Policy Optimization with Multiple Importance Sampling Regularization
Authors: Pierre Liotet, Francesco Vidaich, Alberto Maria Metelli, Marcello Restelli7525-7533
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we empirically validate our approach, in comparison with state-of-the-art algorithms, on realistic environments, including water resource management and trading. and After having revised the literature (Section 5), we provide an experimental evaluation on realistic domains, including a trading and water resource management, in comparison with state-of-the-art baselines (Section 6). |
| Researcher Affiliation | Academia | Pierre Liotet1, Francesco Vidaich2, Alberto Maria Metelli1, Marcello Restelli1 1Politecnico di Milano 2University of Padova |
| Pseudocode | Yes | Algorithm 1: Lifelong learning with POLIS |
| Open Source Code | Yes | The code is available at https://github.com/pierresdr/polis. |
| Open Datasets | No | We consider three datasets of historical data, 2009-2012, 2013-2016, and 2017-2020; each period having a little more than 1000 data points. and The inflow (e.g., rain) is the non-stationary process and the agent has obviously no impact on it, thus satisfying assumption 6.1. The mean inflow follows one of either 3 profiles given in Appendix C.2. (The paper describes the data sources but does not provide concrete access information such as a direct link, DOI, or formal citation to a publicly available version of the exact datasets used.) |
| Dataset Splits | Yes | In the first, we select the best performing hyperparameters from the dataset 2009-2012 and evaluate the selection on the other two datasets. In the second approach, we both select the hyperparameters and evaluate on the last two datasets. |
| Hardware Specification | No | The paper does not provide any specific hardware details for the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | For all tasks, we set γ = ω = 1.3 We consider a particular subclass of non-stationary environments, frequently encountered in practice. and α is set to 500 and we consider a target period of 500 steps. and α is set to 1000 in order to include enough years of past data in the estimator. We provide results for a target period of 500 steps. and but is now training its hyper-policy every few steps (50 in all experiments) for a given number of gradient steps (100 in all experiments). |