Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Adaptive Batch Size for Safe Policy Gradients
Authors: Matteo Papini, Matteo Pirotta, Marcello Restelli
NeurIPS 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Besides providing theoretical guarantees, we show numerical simulations to analyse the behaviour of our methods. and Finally, in Section 5 we empirically analyse the behaviour of the proposed methods on a simple simulated control task. |
| Researcher Affiliation | Academia | Matteo Papini DEIB Politecnico di Milano, Italy, Matteo Pirotta Seque L Team Inria Lille, France, Marcello Restelli DEIB Politecnico di Milano, Italy |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., repository link, explicit statement of code release, or mention of code in supplementary materials) for the described methodology. |
| Open Datasets | Yes | In this section, we test the proposed methods on the linear-quadratic Gaussian regulation (LQG) problem [23]. |
| Dataset Splits | No | The paper discusses the number of trajectories used for learning, but does not provide specific training, validation, or test dataset splits (e.g., percentages or counts) or mention cross-validation. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies, such as library names with version numbers, needed to replicate the experiments. |
| Experiment Setup | Yes | The LQG problem is defined by transition model st+1 N(st + at, σ2 0), Gaussian policy at N(θ s, σ2) and reward rt = 0.5(s2 t + a2 t). In all our simulations we use σ0 = 0... Both action and state variables are bounded to the interval [ 2, 2] and the initial state is drawn uniformly at random. We use a discount factor γ = 0.9... starting from θ = 0.55 and stopping after a total of one million trajectories. In the following simulations, we use σ = 1 and start from θ = 0, stopping after a total of 30 million trajectories. |