Sample Complexity Bounds for Iterative Stochastic Policy Optimization

Authors: Marin Kobilarov

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The approach is illustrated with a simple robot control scenario and initial steps towards applications to challenging aerial vehicle navigation problems are presented. We next illustrate the application of these bounds using the simple scenario introduced in 3. Application to Aerial Vehicle Navigation
Researcher Affiliation Academia Marin Kobilarov Department of Mechanical Engineering Johns Hopkins University Baltimore, MD 21218 marin@jhu.edu
Pseudocode Yes Iterative Stochastic Policy Optimization (ISPO) 0. Start with initial hyper-parameters ν0 (i.e. a prior), set i = 0 1. Sample M trajectories (ξj, τj) p( |νi) for j = 1, . . . , M 2. Compute new policy νi+1 using observed costs J(τj) 3. Compute bound on expected cost and Stop if below threshold, else set i=i+1 and Goto 1
Open Source Code No The paper does not include any explicit statements about releasing the source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets No The paper describes experiments in a 'simulated environment' and a 'campus-like environment' using an 'experimentally identified model', but it does not provide concrete access information (link, DOI, repository, or formal citation) for any publicly available or open dataset used for training.
Dataset Splits No The paper mentions sample sizes and iteration windows for computing bounds but does not provide specific train/validation/test dataset splits or cross-validation details as would be typical for machine learning experiments.
Hardware Specification No The paper mentions an 'Asc Tec quadrotor' as the subject of the experiment but provides no specific details about the computing hardware (e.g., GPU/CPU models, memory) used to run the simulations or analysis.
Software Dependencies No The paper mentions the use of a 'high-fidelity open-source physics engine' but does not specify its name or version, nor does it list any other software dependencies with version numbers.
Experiment Setup Yes We used a window of maximum L = 10 previous iterations to compute the bounds, i.e. to compute νi+1 all samples from densities νi L+1, νi L+2, . . . , νi were used. using M = 200 samples (Figure 1) at each iteration. At each iteration M = 200 samples are taken with 1 δ = 0.95 confidence level. A window of L = 5 past iterations were used for the bounds.