Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning

Authors: Alberto Maria Metelli, Flavio Mazzolini, Lorenzo Bisi, Luca Sabbioni, Marcello Restelli

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present an experimental campaign on benchmark domains to show the advantages of action persistence and proving the effectiveness of our persistence selection method.
Researcher Affiliation Academia 1Politecnico di Milano, Milan, Italy. 2Institute for Scientific Interchange Foundation, Turin, Italy.
Pseudocode Yes Algorithm 1 Persistent Fitted Q-Iteration PFQI(k).
Open Source Code Yes The code is available at github.com/albertometelli/pfqi.
Open Datasets Yes We train PFQI on several continuous control tasks, including Cartpole (Barto et al., 1983), Mountain Car (Moore, 1991), Lunar Lander, Pendulum, Acrobot (Brockman et al., 2016), Swimmer (Coulom, 2002), Hopper and Walker 2D (Erickson et al., 2019) from OpenAI Gym (Brockman et al., 2016)
Dataset Splits No The paper describes using a batch of samples for training and evaluates performance, but it does not explicitly mention the use of a distinct 'validation' dataset split or its size/percentage for hyperparameter tuning or model selection.
Hardware Specification No The paper describes running experiments but does not provide any specific details about the hardware used, such as CPU models, GPU types, or memory specifications.
Software Dependencies No The paper mentions using 'Python 3' and 'scikit-learn' (Pedregosa et al., 2011) for Extra-Trees, but it does not specify exact version numbers for these software dependencies or other libraries.
Experiment Setup Yes The learning algorithm is run for J = 1000 iterations for Cartpole and J = 2000 for the other environments. The discount factor is set to γ = 0.99 for all the domains. The batch of samples D is obtained by collecting 100 trajectories using a uniform random policy... We use 20 independent runs, each with a different random seed.