reproducibilityindex.ai

Emergence of Adaptive Circadian Rhythms in Deep Reinforcement Learning

Authors: Aqeel Labash, Florian Stelzer, Daniel Majoral, Raul Vicente Zafra

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we study the emergence of circadian-like rhythms in deep reinforcement learning agents. In particular, we deployed agents in an environment with a reliable periodic variation while solving a foraging task. We systematically characterize the agent s behavior during learning and demonstrate the emergence of a rhythm that is endogenous and entrainable.
Researcher Affiliation	Academia	1Institute of Computer Science, University of Tartu, Tartu, Estonia.
Pseudocode	No	The paper does not contain any clearly labeled pseudocode blocks or algorithms.
Open Source Code	Yes	Code at https://github.com/aqeel13932/MN_project
Open Datasets	No	The paper describes a custom "foraging task" simulated using the "Artificial Primate Environment Simulator (APES)", a 2D grid world simulator. It does not use a pre-existing, publicly available dataset in the conventional sense (e.g., MNIST, ImageNet) that would have a specific link, DOI, or formal citation for data access.
Dataset Splits	No	The paper states: "We train the network for 37500 training episodes, each consisting of 160 time steps (four full days)." and refers to "1000 test runs." It does not explicitly mention or detail a separate validation set or provide specific training/validation/test splits (e.g., percentages, sample counts) for a dataset.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, cluster specifications) used to run the experiments. It only mentions general aspects of training and network implementation.
Software Dependencies	Yes	The network was implemented using the Keras 2.1.5 library (Chollet et al., 2015).
Experiment Setup	Yes	We train our model for 37500 training episodes, each of which consists of 160 time steps. That is, we train for a total number of 6 million time steps. The daylight signal has a period of 40 time steps. For the first 20 steps within this period, the daylight signal has the value 1 (daytime), for the remaining 20 steps it has the value 0 (night). ... The network is trained using the learning rate η = 0.001. The exploration parameter ε is linearly annealed from 1 to 0.1 for the first 75% of the training and constantly 0.1 for the remaining training steps. The hyperparameters are summarized in Table A1.