Deep active inference agents using Monte-Carlo methods

Authors: Zafeirios Fountas, Noor Sajid, Pedro Mediano, Karl Friston

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate this in a new toy environment, based on the d Sprites data-set, and demonstrate that active inference agents automatically create disentangled representations that are apt for modeling state transitions. In a more complex Animal-AI environment, our agents (using the same neural architecture) are able to simulate future state transitions and actions (i.e., plan), to evince reward-directed navigation despite temporary suspension of visual input. These results show that deep active inference equipped with MC methods provides a flexible framework to develop biologically-inspired intelligent agents, with applications in both machine learning and cognitive science. We initially show through a simple visual demonstration (Fig. 2B) that agents learn the environment dynamics with or without consistent visual input for both dynamic d Sprites and Animal AI. This is further investigated, for the dynamic d Sprites, by evaluating task performance (Fig. 3A-C), as well as reconstruction loss for both predicted visual input and reward (Fig. 3D-E) during training.
Researcher Affiliation Collaboration Zafeirios Fountas Emotech Labs & WCHN, University College London f@emotech.co Noor Sajid WCHN, University College London noor.sajid.18@ucl.ac.uk Pedro A.M. Mediano University of Cambridge pam83@cam.ac.uk Karl Friston WCHN, University College London k.friston@ucl.ac.uk
Pseudocode No The paper describes methods through text and diagrams (Figure 1C shows an MCTS scheme) but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes The complete source-code, data, and pre-trained agents, is available on Git Hub (https://github.com/zfountas/deep-active-inference-mc).
Open Datasets Yes Dynamic d Sprites We defined a simple 2D environment based on the d Sprites dataset [32, 31]. Animal-AI We used a variation of preferences task from the Animal-AI environment [33].
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. It mentions batch size for training but no explicit train/validation/test splits.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions ADAM for optimization but does not provide specific ancillary software details, such as library names with version numbers (e.g., Python, PyTorch, or TensorFlow versions).
Experiment Setup No The paper mentions training strategies (on-policy/off-policy), batch size (100), and the optimizer (ADAM) with a regularization term, but defers the 'explicit training procedure' to supplementary material and does not provide specific hyperparameter values like learning rate or number of epochs in the main text.