Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Approximate Modified Policy Iteration and its Application to the Game of Tetris

Authors: Bruno Scherrer, Mohammad Ghavamzadeh, Victor Gabillon, Boris Lesner, Matthieu Geist

JMLR 2015 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate and evaluate the behavior of these new algorithms in the Mountain Car and Tetris problems. Remarkably, in Tetris, CBMPI outperforms the existing DP approaches by a large margin, and competes with the current state-of-the-art methods while using fewer samples.1... 6. Experimental Results
Researcher Affiliation Collaboration Bruno Scherrer EMAIL INRIA Nancy Grand Est, Team Maia, 615 rue du Jardin Botanique, 54600 Vandœuvre-ls-Nancy, France Mohammad Ghavamzadeh EMAIL Adobe Research & INRIA Lille 321 Park Avenue San Jose, CA 95110, USA Victor Gabillon EMAIL INRIA Lille Nord Europe, Team Seque L, 40 avenue Halley, 59650 Villeneuve d Ascq, France Boris Lesner EMAIL INRIA Nancy Grand Est, Team Maia, 615 rue du Jardin Botanique, 54600 Vandœuvre-ls-Nancy, France Matthieu Geist EMAIL Centrale Sup elec, IMS-Ma LIS Research Group & UMI 2958 (Georgia Tech-CNRS), 2 rue Edouard Belin, 57070 Metz, France
Pseudocode Yes Figure 1: The pseudo-code of the AMPI-V algorithm. Figure 2: The pseudo-code of the AMPI-Q algorithm. Figure 3: The pseudo-code of the CBMPI algorithm.
Open Source Code No The paper does not provide concrete access to source code for the methodology described. While it references 'standard code by Thiery and Scherrer (2010b)' for Tetris features, this refers to external tools rather than the authors' own implementation of the AMPI algorithms.
Open Datasets No The paper uses the Mountain Car and Tetris problems, which are simulation environments where data is generated through rollouts, rather than using pre-existing open datasets. For instance, in Section 6.2, it discusses 'sampling states from the trajectories generated by a good policy for Tetris' and 'generating a rollout of length m + 1'.
Dataset Splits No The paper describes experiments in simulation environments (Mountain Car and Tetris) where data is generated dynamically through rollouts rather than using a static dataset with predefined splits. For example, for Mountain Car: 'The policies learned by the algorithms are evaluated by the number of steps-to-go (average number of steps to reach the goal with a maximum of 300) averaged over 4,000 independent trials. More precisely, we define the possible starting configurations (positions and velocities) of the car by placing a 20x20 uniform grid over the state space, and run the policy 6 times from each possible initial configuration.' The concept of dataset splits is not applicable here as data is generated on-the-fly for evaluation.
Hardware Specification No The experiments were conducted using Grid5000 (https://www.grid5000.fr). This mentions a computing platform but does not provide specific hardware details like CPU/GPU models, processor types, or memory specifications.
Software Dependencies No The paper mentions using 'LIBSVM implementation by Chang and Lin (2011)' for a support vector classifier and the 'covariance matrix adaptation evolution strategy (CMA-ES) algorithm (Hansen and Ostermeier, 2001)'. However, it does not provide specific version numbers for LIBSVM, CMA-ES, or any other software components, which is required for reproducible description.
Experiment Setup Yes The discount factor is set to γ = 0.99. Each state s consists of the pair (xs, xs), where xs is the position of the car and xs is its velocity. We use the formulation described in Dimitrakakis and Lagoudakis (2008) with uniform noise in [ 0.2, 0.2] added to the actions. ... The value function is approximated using a linear space spanned by a set of radial basis functions (RBFs) evenly distributed over the state space. ... We use the RBF kernel exp( |u v|2) and set the cost parameter C = 1000. ... The total budget B is set to 4,000 per iteration. ... For the CE method, we set ζ = 0.1 and η = 4, the best parameters reported in Thiery and Scherrer (2009b). We also set n = 1,000 and G = 10 in the small board (10 10) and n = 100 and G = 1 in the large board (10 20).