Sample Complexity Bounds for Iterative Stochastic Policy Optimization
Authors: Marin Kobilarov
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The approach is illustrated with a simple robot control scenario and initial steps towards applications to challenging aerial vehicle navigation problems are presented. We next illustrate the application of these bounds using the simple scenario introduced in 3. Application to Aerial Vehicle Navigation |
| Researcher Affiliation | Academia | Marin Kobilarov Department of Mechanical Engineering Johns Hopkins University Baltimore, MD 21218 marin@jhu.edu |
| Pseudocode | Yes | Iterative Stochastic Policy Optimization (ISPO) 0. Start with initial hyper-parameters ν0 (i.e. a prior), set i = 0 1. Sample M trajectories (ξj, τj) p( |νi) for j = 1, . . . , M 2. Compute new policy νi+1 using observed costs J(τj) 3. Compute bound on expected cost and Stop if below threshold, else set i=i+1 and Goto 1 |
| Open Source Code | No | The paper does not include any explicit statements about releasing the source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | No | The paper describes experiments in a 'simulated environment' and a 'campus-like environment' using an 'experimentally identified model', but it does not provide concrete access information (link, DOI, repository, or formal citation) for any publicly available or open dataset used for training. |
| Dataset Splits | No | The paper mentions sample sizes and iteration windows for computing bounds but does not provide specific train/validation/test dataset splits or cross-validation details as would be typical for machine learning experiments. |
| Hardware Specification | No | The paper mentions an 'Asc Tec quadrotor' as the subject of the experiment but provides no specific details about the computing hardware (e.g., GPU/CPU models, memory) used to run the simulations or analysis. |
| Software Dependencies | No | The paper mentions the use of a 'high-fidelity open-source physics engine' but does not specify its name or version, nor does it list any other software dependencies with version numbers. |
| Experiment Setup | Yes | We used a window of maximum L = 10 previous iterations to compute the bounds, i.e. to compute νi+1 all samples from densities νi L+1, νi L+2, . . . , νi were used. using M = 200 samples (Figure 1) at each iteration. At each iteration M = 200 samples are taken with 1 δ = 0.95 confidence level. A window of L = 5 past iterations were used for the bounds. |