Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Approximate Newton Methods for Policy Search in Markov Decision Processes
Authors: Thomas Furmston, Guy Lever, David Barber
JMLR 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 6 we present experiments demonstrating state-of-the-art performance on challenging domains including Tetris and robotic arm applications. ... In this section we provide an empirical evaluation of the Gauss-Newton methods on a variety of domains. |
| Researcher Affiliation | Academia | Thomas Furmston EMAIL Department of Computer Science University College London London, WC1E 6BT Guy Lever EMAIL Department of Computer Science University College London London, WC1E 6BT David Barber EMAIL Department of Computer Science University College London London, WC1E 6BT |
| Pseudocode | Yes | Algorithm 1: Generic gradient-based policy search algorithm Algorithm 2: Recurrent state sampling algorithm to estimate the search direction of the second Gauss-Newton method. The algorithm is applicable to Markov decision processes with an infinite planning horizon and average rewards. |
| Open Source Code | No | The paper does not contain an explicit statement or link providing access to source code for the methodology described. |
| Open Datasets | No | The paper describes experiments in various domains (Non-Linear Navigation, N-link Rigid Manipulator, Tetris, Robot Arm) which are either synthetic, simulated, or game environments. It does not provide concrete access information (link, DOI, repository, or formal citation of a specific dataset) for publicly available datasets used in the experiments. For Tetris, it mentions using 'the same set of features as used in the works of Bertsekas and Ioffe (1996) & Kakade (2002)' but this refers to features, not a dataset. |
| Dataset Splits | No | The paper describes how samples are generated and used during training (e.g., '50 trajectories were sampled during each training iteration' in C.1; 'sampling 1000 games' in C.3; '15 actions from the policy... sampled' in C.4). However, it does not specify explicit training/test/validation dataset splits from a static, pre-collected dataset, as the environments are typically interactive Markov Decision Processes where data is generated dynamically. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, memory) used to run the experiments. It refers to 'Simulation Lab (Schaal, 2006) environment' for robot arm experiments but no hardware details. |
| Software Dependencies | No | The paper mentions 'Simulation Lab (Schaal, 2006) environment' but does not provide a specific version number. No other key software components are listed with version numbers. |
| Experiment Setup | Yes | Non-Linear Navigation Experiment (C.1): '50 trajectories were sampled during each training iteration', 'finite planning horizon, H = 80', 'initial control parameters were sampled from the region w0 [0, 60] [ 8, 0]'. N-link Rigid Manipulator Experiment (C.2): 'step size sequences... tuned for performance', 'fixed step sizes: 0.001, 0.01, 0.1, 1, 10, 20, 30, 100 and 250', 'the fixed step size of 30 gave consistently good results'. Tetris Experiment (C.3): 'sampling 1000 games', 'simple line search... step sizes 0.1, 0.5, 1.0, 2.0, 4.0, 8.0, 16.0, 32.0, 64.0, 128.0', '100 repetitions of the experiment, each consisting of 100 training iterations'. Robot Arm Experiment (C.4): 'episode 20 seconds in length', '10 shape parameters for each of the individual motor primitives', '140 policy parameters', 'diagonal elements of the precision matrix are initialized to 0.01', '15 actions from the policy and used the episodes generated from these samples to estimate the search direction', 'samples from the last 10 training iterations', '100 updates of the policy parameters', 'fixed step size of 1.0 was used in the second Gauss-Newton method'. |