Rewiring Neurons in Non-Stationary Environments

Authors: Zhicheng Sun, Yadong Mu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our proposed method is comprehensively evaluated on 18 continual reinforcement learning scenarios ranging from locomotion to manipulation, demonstrating its advantages over state-of-the-art competitors in performance-efficiency tradeoffs. Code is available at https://github.com/feifeiobama/Rewire Neuron.
Researcher Affiliation Academia Zhicheng Sun, Yadong Mu Peking University, Beijing, China {sunzc,myd}@pku.edu.cn
Pseudocode No The paper describes its methods verbally and with figures, but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/feifeiobama/Rewire Neuron.
Open Datasets Yes Environments. We use 18 continual reinforcement learning scenarios from Brax and Continual World: (1) Brax [18, 20] contains 9 locomotion scenarios over 3 domains: Half Cheetah, Ant and Humanoid. (2) Continual World [69] is a manipulation benchmark built on Meta-World [73] and Mu Jo Co [65], featuring 8 scenarios with 3 tasks (CW3) and one scenario with 10 tasks (CW10), both with a varying reward function and a budget of 1M interactions per task. More details are provided in Appendix A.1.
Dataset Splits No The paper does not provide explicit numerical or proportional splits (e.g., train/validation/test percentages or counts) for datasets used in the experiments. It mentions
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for experiments.
Software Dependencies No We build on the Sa Lin A library [12] and adopt Soft Actor-Critic (SAC) [25] with autotuned temperature [26] as the underlying algorithm. Both the actor and the critic are 4-layer perceptions with 256 hidden neurons per layer, while the actor also includes task-specific heads [69]. Their training configurations follow [20].
Experiment Setup Yes Implementation details. We build on the Sa Lin A library [12] and adopt Soft Actor-Critic (SAC) [25] with autotuned temperature [26] as the underlying algorithm. Both the actor and the critic are 4-layer perceptions with 256 hidden neurons per layer, while the actor also includes task-specific heads [69]. Their training configurations follow [20]. For our method, we choose the new hyperparameters K, α, and β via grid search for each scenario, and provide a sensitivity analysis in Section 4.3. The score vectors in Eq. (4) are initialized with an arithmetic sequence rescaled to [0, 1], and the temperature is τ = 1 by default.