Sequential Classification-Based Optimization for Direct Policy Search
Authors: Yi-Qi Hu, Hong Qian, Yang Yu
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on a helicopter hovering task and controlling tasks in Open AI Gym show that the new algorithm significantly improve the performance from several state-of-the-art derivative-free optimization approaches. |
| Researcher Affiliation | Academia | Yi-Qi Hu, Hong Qian, Yang Yu National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, 210023, China yuy@nju.edu.cn |
| Pseudocode | Yes | Algorithm 1 RACOS (batch-mode); Algorithm 2 Sequential RACOS (SRACOS) |
| Open Source Code | No | The paper provides links to the code of other algorithms (RACOS, CMA-ES, DE, IMGPO) used for comparison, but not for the proposed SRACOS algorithm. |
| Open Datasets | Yes | controlling tasks in Open AI Gym, an open source environment for reinforcement learning (http://gym.openai.com); helicopter hovering control task (Kim et al. 2003) |
| Dataset Splits | No | The paper does not explicitly provide details about training, validation, and test dataset splits, as is common in reinforcement learning environments where interaction serves as training and evaluation as testing, rather than fixed data splits. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like 'Sci Py' and implies Python for implementation, but does not provide specific version numbers for these or other libraries/dependencies required for reproducibility. |
| Experiment Setup | Yes | The number of evaluations is set as 105 for all algorithms. Each algorithm is repeated 15 times independently. All algorithms are used with their default parameters. The task information and neural network structures are showed in Table 3. For example, on Acrobot : |S| = 6, |A| = 1, the neural network has two hidden layers with 5 and 3 neurons each, |w| = 48 and the maximum number of steps is 2,000. |