Emergence of Adaptive Circadian Rhythms in Deep Reinforcement Learning
Authors: Aqeel Labash, Florian Stelzer, Daniel Majoral, Raul Vicente Zafra
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we study the emergence of circadian-like rhythms in deep reinforcement learning agents. In particular, we deployed agents in an environment with a reliable periodic variation while solving a foraging task. We systematically characterize the agent s behavior during learning and demonstrate the emergence of a rhythm that is endogenous and entrainable. |
| Researcher Affiliation | Academia | 1Institute of Computer Science, University of Tartu, Tartu, Estonia. |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode blocks or algorithms. |
| Open Source Code | Yes | Code at https://github.com/aqeel13932/MN_project |
| Open Datasets | No | The paper describes a custom "foraging task" simulated using the "Artificial Primate Environment Simulator (APES)", a 2D grid world simulator. It does not use a pre-existing, publicly available dataset in the conventional sense (e.g., MNIST, ImageNet) that would have a specific link, DOI, or formal citation for data access. |
| Dataset Splits | No | The paper states: "We train the network for 37500 training episodes, each consisting of 160 time steps (four full days)." and refers to "1000 test runs." It does not explicitly mention or detail a separate validation set or provide specific training/validation/test splits (e.g., percentages, sample counts) for a dataset. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, cluster specifications) used to run the experiments. It only mentions general aspects of training and network implementation. |
| Software Dependencies | Yes | The network was implemented using the Keras 2.1.5 library (Chollet et al., 2015). |
| Experiment Setup | Yes | We train our model for 37500 training episodes, each of which consists of 160 time steps. That is, we train for a total number of 6 million time steps. The daylight signal has a period of 40 time steps. For the first 20 steps within this period, the daylight signal has the value 1 (daytime), for the remaining 20 steps it has the value 0 (night). ... The network is trained using the learning rate η = 0.001. The exploration parameter ε is linearly annealed from 1 to 0.1 for the first 75% of the training and constantly 0.1 for the remaining training steps. The hyperparameters are summarized in Table A1. |