Reinforcement Learning Framework for Deep Brain Stimulation Study
Authors: Dmitrii Krylov, Remi Tachet des Combes, Romain Laroche, Michael Rosenblum, Dmitry V. Dylov
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We successfully suppress synchrony via RL for three pathological signaling regimes, characterize the framework s stability to noise, and further remove the unwanted oscillations by engaging multiple PPO agents. |
| Researcher Affiliation | Collaboration | 1Skolkovo Institute of Science and Technology, Bolshoy blvd. 30/1, Moscow, 121205, Russia 2Microsoft Research Lab, 550-2000 Mc Gill College Ave, Montr eal H3A 3H3, Canada 3University of Potsdam, Karl-Liebknecht-Str. 24/25, 14476 Potsdam-Golm, Germany |
| Pseudocode | No | The paper includes a conceptual diagram (Figure 1) but no structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/cviaai/RL-DBS/ |
| Open Datasets | No | The paper uses data generated by numerical solutions of ordinary differential equations (Eqs. 1 and 2) simulating neuronal models, rather than a publicly available dataset. |
| Dataset Splits | No | The paper does not mention explicit training, validation, or test dataset splits, as it uses simulated, dynamically generated data rather than fixed datasets. |
| Hardware Specification | No | The paper vaguely states that 'training was performed on CPU' but provides no specific hardware details like CPU model, memory, or GPU specifications. |
| Software Dependencies | No | The paper mentions using the 'Stable Baselines library' but does not specify its version number or other software dependencies with versions. |
| Experiment Setup | Yes | In our experiments, we used two-hidden layers MLPs with 64 neurons, trained using the Stable Baselines library [Hill et al., 2018], with the default parameters for PPO. ... γ is a discount factor that controls the tradeoff between long-term and immediate rewards (set to 0.99 in our experiments). ... For a given action A and a given observation Xstate at time t, we propose the following class of reward functions for synchrony suppression tasks: R t = X(t) Xstate t 2 β|A(t)| ... for β = 2 leads to convergence to the natural equilibrium point. |