Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Contrastive Active Inference
Authors: Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we compare the contrastive AIF method to likelihood-based AIF and MBRL in high-dimensional image-based settings. Our experimentation aims to answer the following questions: (i) is it possible to achieve high-dimensional goals with AIF-based methods? (ii) what is the difference in performance between RL-based and AIF-based methods? (iii) does contrastive AIF perform better than likelihood-based AIF? (iv) in what contexts contrastive methods are more desirable than likelihood-based methods? (v) are AIF-based methods resilient to variations in the environment background? |
| Researcher Affiliation | Academia | Pietro Mazzaglia IDLab Ghent University EMAIL Tim Verbelen IDLab Ghent University EMAIL Bart Dhoedt IDLab Ghent University EMAIL |
| Pseudocode | Yes | The training routine, which alternates updates to the models with data collection, is shown in Algorithm 1. |
| Open Source Code | No | The paper mentions external resources like gym-minigrid and DeepMind Control Suite, but does not provide a link or explicit statement for its own source code. |
| Open Datasets | Yes | We performed experiments on the Empty 6 6 and the Empty 8 8 environments from the Mini Grid suite [8]... We performed continuous-control experiments on the Reacher Easy and Hard tasks from the Deep Mind Control (DMC) Suite [48] and on Reacher Easy from the Distracting Control Suite [47]. |
| Dataset Splits | No | The paper describes how data is collected during training episodes and how performance is evaluated on trajectories, but does not specify fixed train/validation/test dataset splits in terms of percentages or counts for reproducibility. |
| Hardware Specification | No | Relevant parameterization for the experiments can be found in the next section, while hyperparameters and a detailed description of each network are left to the Appendix. |
| Software Dependencies | No | Relevant parameterization for the experiments can be found in the next section, while hyperparameters and a detailed description of each network are left to the Appendix. |
| Experiment Setup | Yes | For the 6 6 task, the world model is trained by sampling B = 50 trajectories of length L = 7, while the behavior model is trained by imagining H = 6 steps long trajectories. For the 8 8 task, we increased the length L to 11 and the imagination horizon H to 10. For both tasks, we ๏ฌrst collected R = 50 random episodes, to populate the replay buffer, and train for U = 100 steps after collecting a new trajectory. ... For both tasks, the world model is trained by sampling B = 30 trajectories of length L = 30, while the behavior model is trained by imagining H = 10 steps long trajectories. We ๏ฌrst collect R = 50 random episodes, to populate the replay buffer, and train for U = 100 steps after every new trajectory. |