reproducibilityindex.ai

Adaptive Batch Size for Safe Policy Gradients

Authors: Matteo Papini, Matteo Pirotta, Marcello Restelli

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Besides providing theoretical guarantees, we show numerical simulations to analyse the behaviour of our methods. and Finally, in Section 5 we empirically analyse the behaviour of the proposed methods on a simple simulated control task.
Researcher Affiliation	Academia	Matteo Papini DEIB Politecnico di Milano, Italy, Matteo Pirotta Seque L Team Inria Lille, France, Marcello Restelli DEIB Politecnico di Milano, Italy
Pseudocode	No	No structured pseudocode or algorithm blocks were found.
Open Source Code	No	The paper does not provide any concrete access information (e.g., repository link, explicit statement of code release, or mention of code in supplementary materials) for the described methodology.
Open Datasets	Yes	In this section, we test the proposed methods on the linear-quadratic Gaussian regulation (LQG) problem [23].
Dataset Splits	No	The paper discusses the number of trajectories used for learning, but does not provide specific training, validation, or test dataset splits (e.g., percentages or counts) or mention cross-validation.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies, such as library names with version numbers, needed to replicate the experiments.
Experiment Setup	Yes	The LQG problem is deﬁned by transition model st+1 N(st + at, σ2 0), Gaussian policy at N(θ s, σ2) and reward rt = 0.5(s2 t + a2 t). In all our simulations we use σ0 = 0... Both action and state variables are bounded to the interval [ 2, 2] and the initial state is drawn uniformly at random. We use a discount factor γ = 0.9... starting from θ = 0.55 and stopping after a total of one million trajectories. In the following simulations, we use σ = 1 and start from θ = 0, stopping after a total of 30 million trajectories.