Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Expected flow networks in stochastic environments and two-player zero-sum games

Authors: Marco Jiralerspong, Bilun Sun, Danilo Vucetic, Tianyu Zhang, Yoshua Bengio, Gauthier Gidel, Nikolay Malkin

ICLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments to investigate whether EFlow Nets can effectively learn in stochastic environments compared to related methods ( 4.1) and whether AFlow Nets are effective learners of adversarial gameplay, as measured by their performance against contemporary approaches ( 4.2).
Researcher Affiliation Academia Mila Qu ebec AI Institute, Universit e de Montr eal n marco.jiralerspong,bilun.sun,danilo.vucetic,tianyu.zhang, yoshua.bengio,gidelgau,nikolay.malkin o @mila.quebec
Pseudocode Yes Algorithm 1: Branch-adjusted AFlow Net Training
Open Source Code Yes Code: https://github.com/GFNOrg/Adversarial Flow Networks.
Open Datasets Yes We evaluate EFlow Nets in a protein design task from Jain et al. (2022).
Dataset Splits No The paper describes sampling methods and training policies, but does not provide specific details on training, validation, and test dataset splits (e.g., percentages or counts) or refer to standard splits for the datasets used.
Hardware Specification Yes GPU 1x RTX3090Ti (Tic-tac-toe) 1x RTX8000 (Connect-4)
Software Dependencies No The paper mentions implementing models (e.g., Alpha Zero implementation, SAC reimplementation) but does not provide specific version numbers for software dependencies such as Python, PyTorch, or other libraries.
Experiment Setup Yes num trajectories epoch: 10240, batch size: 512 (Tic-tac-toe) / 1024 (Connect-4), num steps: 500 (Tic-tac-toe) / 250 (Connect-4), replay buffer capacity: 10240 (Tic-tac-toe) / 250000 (Connect-4), learning rate: 1e-3, learning rate Z: 5e-2, num residual blocks: 10 (Tic-tac-toe) / 15 (Connect-4)