Sequential Classification-Based Optimization for Direct Policy Search

Authors: Yi-Qi Hu, Hong Qian, Yang Yu

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on a helicopter hovering task and controlling tasks in Open AI Gym show that the new algorithm significantly improve the performance from several state-of-the-art derivative-free optimization approaches.
Researcher Affiliation Academia Yi-Qi Hu, Hong Qian, Yang Yu National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, 210023, China yuy@nju.edu.cn
Pseudocode Yes Algorithm 1 RACOS (batch-mode); Algorithm 2 Sequential RACOS (SRACOS)
Open Source Code No The paper provides links to the code of other algorithms (RACOS, CMA-ES, DE, IMGPO) used for comparison, but not for the proposed SRACOS algorithm.
Open Datasets Yes controlling tasks in Open AI Gym, an open source environment for reinforcement learning (http://gym.openai.com); helicopter hovering control task (Kim et al. 2003)
Dataset Splits No The paper does not explicitly provide details about training, validation, and test dataset splits, as is common in reinforcement learning environments where interaction serves as training and evaluation as testing, rather than fixed data splits.
Hardware Specification No The paper does not specify any hardware details (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions software like 'Sci Py' and implies Python for implementation, but does not provide specific version numbers for these or other libraries/dependencies required for reproducibility.
Experiment Setup Yes The number of evaluations is set as 105 for all algorithms. Each algorithm is repeated 15 times independently. All algorithms are used with their default parameters. The task information and neural network structures are showed in Table 3. For example, on Acrobot : |S| = 6, |A| = 1, the neural network has two hidden layers with 5 and 3 neurons each, |w| = 48 and the maximum number of steps is 2,000.