Noisy Derivative-Free Optimization With Value Suppression
Authors: Hong Wang, Hong Qian, Yang Yu
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On synthetic problems as well as reinforcement learning tasks, experiments verify that value suppression can be significantly more effective than the previous methods. |
| Researcher Affiliation | Academia | Hong Wang, Hong Qian, Yang Yu National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China waghon@outlook.com, {qianh,yuy}@lamda.nju.edu.cn |
| Pseudocode | Yes | Algorithm 1 Value Suppression Framework for Derivative Free Optimization; Algorithm 2 SRACOS; Algorithm 3 Suppressed SRACOS (SSRACOS) |
| Open Source Code | No | The paper does not contain an explicit statement about the release of its source code or a direct link to a code repository. |
| Open Datasets | Yes | We conduct experiments on two synthetic functions and controlling tasks of reinforcement learning in Open AI Gym. Open AI Gym provides a toolkit for reinforcement learning research (https://gym.openai.com). There are many controlling tasks, from which we choose Acrobot, Mountain Car, Half Cheetah, Humanoid, Swimmer, Ant, Hopper, and Lunar Lander to compare the ability of reducing the effects of noise for each noise handling mechanism. |
| Dataset Splits | No | The paper does not specify traditional training, validation, or test dataset splits. For reinforcement learning tasks, policies are evaluated through simulations in environments rather than on static dataset splits. For synthetic functions, it mentions the total number of function evaluations. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Open AI Gym' but does not specify its version or any other software libraries and their version numbers used in the experiments. |
| Experiment Setup | Yes | The parameters of these noise handling mechanisms are set as follows. For sampling, the sample size is set to be 10. For threshold selection, we set the threshold value τ = σ. For value suppression, we set the maximum allowed non-update iterations u = 500, the re-sample size v = 100, and the balance parameter α = 0.5. The settings of neural network and Open AI Gym tasks is listed in Table 2, where d State, #Actions, NN nodes, #Weights and Horizon denote the dimension size of observation, the dimension size of action, the hidden layers of the neural network, the total number of parameters in the neural network and the maximum step, respectively. We compare these mechanisms under the same parameter setting of SRACOS, which is listed in Table 3, where #B and #B+ denote the size of negative set and positive set respectively, and U-bits denotes the number of bits that can be changed when generating a new solution from a positive solution. |