Contrastive Active Inference

Authors: Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we compare the contrastive AIF method to likelihood-based AIF and MBRL in high-dimensional image-based settings. Our experimentation aims to answer the following questions: (i) is it possible to achieve high-dimensional goals with AIF-based methods? (ii) what is the difference in performance between RL-based and AIF-based methods? (iii) does contrastive AIF perform better than likelihood-based AIF? (iv) in what contexts contrastive methods are more desirable than likelihood-based methods? (v) are AIF-based methods resilient to variations in the environment background?
Researcher Affiliation Academia Pietro Mazzaglia IDLab Ghent University pietro.mazzaglia@ugent.be Tim Verbelen IDLab Ghent University tim.verbelen@ugent.be Bart Dhoedt IDLab Ghent University bart.dhoedt@ugent.be
Pseudocode Yes The training routine, which alternates updates to the models with data collection, is shown in Algorithm 1.
Open Source Code No The paper mentions external resources like gym-minigrid and DeepMind Control Suite, but does not provide a link or explicit statement for its own source code.
Open Datasets Yes We performed experiments on the Empty 6 6 and the Empty 8 8 environments from the Mini Grid suite [8]... We performed continuous-control experiments on the Reacher Easy and Hard tasks from the Deep Mind Control (DMC) Suite [48] and on Reacher Easy from the Distracting Control Suite [47].
Dataset Splits No The paper describes how data is collected during training episodes and how performance is evaluated on trajectories, but does not specify fixed train/validation/test dataset splits in terms of percentages or counts for reproducibility.
Hardware Specification No Relevant parameterization for the experiments can be found in the next section, while hyperparameters and a detailed description of each network are left to the Appendix.
Software Dependencies No Relevant parameterization for the experiments can be found in the next section, while hyperparameters and a detailed description of each network are left to the Appendix.
Experiment Setup Yes For the 6 6 task, the world model is trained by sampling B = 50 trajectories of length L = 7, while the behavior model is trained by imagining H = 6 steps long trajectories. For the 8 8 task, we increased the length L to 11 and the imagination horizon H to 10. For both tasks, we first collected R = 50 random episodes, to populate the replay buffer, and train for U = 100 steps after collecting a new trajectory. ... For both tasks, the world model is trained by sampling B = 30 trajectories of length L = 30, while the behavior model is trained by imagining H = 10 steps long trajectories. We first collect R = 50 random episodes, to populate the replay buffer, and train for U = 100 steps after every new trajectory.