reproducibilityindex.ai

Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning

Authors: Alberto Maria Metelli, Flavio Mazzolini, Lorenzo Bisi, Luca Sabbioni, Marcello Restelli

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present an experimental campaign on benchmark domains to show the advantages of action persistence and proving the effectiveness of our persistence selection method.
Researcher Affiliation	Academia	1Politecnico di Milano, Milan, Italy. 2Institute for Scientiﬁc Interchange Foundation, Turin, Italy.
Pseudocode	Yes	Algorithm 1 Persistent Fitted Q-Iteration PFQI(k).
Open Source Code	Yes	The code is available at github.com/albertometelli/pfqi.
Open Datasets	Yes	We train PFQI on several continuous control tasks, including Cartpole (Barto et al., 1983), Mountain Car (Moore, 1991), Lunar Lander, Pendulum, Acrobot (Brockman et al., 2016), Swimmer (Coulom, 2002), Hopper and Walker 2D (Erickson et al., 2019) from OpenAI Gym (Brockman et al., 2016)
Dataset Splits	No	The paper describes using a batch of samples for training and evaluates performance, but it does not explicitly mention the use of a distinct 'validation' dataset split or its size/percentage for hyperparameter tuning or model selection.
Hardware Specification	No	The paper describes running experiments but does not provide any specific details about the hardware used, such as CPU models, GPU types, or memory specifications.
Software Dependencies	No	The paper mentions using 'Python 3' and 'scikit-learn' (Pedregosa et al., 2011) for Extra-Trees, but it does not specify exact version numbers for these software dependencies or other libraries.
Experiment Setup	Yes	The learning algorithm is run for J = 1000 iterations for Cartpole and J = 2000 for the other environments. The discount factor is set to γ = 0.99 for all the domains. The batch of samples D is obtained by collecting 100 trajectories using a uniform random policy... We use 20 independent runs, each with a different random seed.