Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Momentum-Based Policy Gradient with Second-Order Information
Authors: Saber Salehkaleybar, Mohammadsadegh Khorasani, Negar Kiyavash, Niao He, Patrick Thiran
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experimental evaluations show the effectiveness of the proposed algorithm on various control tasks and its advantage over the state of the art in practice. ... In this section, we evaluate the performance of the proposed algorithm and compare it with previous work for control tasks in Mu Jo Co simulator (Todorov et al., 2012) ... Figure 1: Comparison of SHARP with other variance reduction methods on four control tasks. Table 2: Comparison of SHARP with other variance-reduced methods in terms of PR. |
| Researcher Affiliation | Academia | Saber Salehkaleybar EMAIL Leiden Institute of Advanced Computer Science Leiden University Sadegh Khorasani EMAIL School of Computer and Communication Sciences EPFL Negar Kiyavash EMAIL College of Management of Technology EPFL Niao He EMAIL Department of Computer Science ETH Zurich Patrick Thiran EMAIL School of Computer and Communication Sciences EPFL |
| Pseudocode | Yes | Algorithm 1 Common framework in variance reduction methods... Algorithm 2 The SHARP algorithm |
| Open Source Code | Yes | We implemented SHARP in the Garage library (garage contributors, 2019) as it allows for maintaining and integrating it in future versions of Garage library for easier dissemination. We utilized a Linux server with Intel Xeon CPU E5-2680 v3 (24 cores) operating at 2.50GHz with 377 GB DDR4 of memory, and Nvidia Titan X Pascal GPU. The implementation of SHARP is available as supplementary material. |
| Open Datasets | No | The paper uses control tasks (Reacher, Walker, Humanoid, and Swimmer) in Mu Jo Co simulator. These are descriptions of environments/tasks, not explicit datasets with specific access information (links, DOIs, citations for data availability) in the traditional sense. The paper does not provide concrete access information for publicly available datasets. |
| Dataset Splits | No | The paper describes generating trajectories according to the current policy during the experimental process: "at each iteration t, we generated trajectories according to the current policy until we collected 10k system probes." This refers to online data collection rather than predefined dataset splits for training, validation, or testing. |
| Hardware Specification | Yes | We utilized a Linux server with Intel Xeon CPU E5-2680 v3 (24 cores) operating at 2.50GHz with 377 GB DDR4 of memory, and Nvidia Titan X Pascal GPU. |
| Software Dependencies | No | We implemented SHARP in the Garage library (garage contributors, 2019)... for control tasks in Mu Jo Co simulator (Todorov et al., 2012). While the paper mentions the Garage library and Mu Jo Co simulator, it does not provide specific version numbers for these or any other software components. |
| Experiment Setup | Yes | For each algorithm, we used the same set of Gaussian policies parameterized with neural networks having two layers of 64 neurons each. Baselines and environment settings (such as maximum trajectory horizon, and reward intervals) were considered the same for all algorithms. We chose a maximum horizon of 500 for Walker, Swimmer, and Humanoid and 50 for Reacher. ... The discount factor is also set to 0.99 for all the runs. ... In the following table, we provide the fine-tuned parameters for each algorithm. Table 4: Selected hyper-parameters for different methods. |