Learning to Guide and to be Guided in the Architect-Builder Problem
Authors: Paul Barde, Tristan Karch, Derek Nowrouzezahrai, Clément Moulin-Frier, Christopher Pal, Pierre-Yves Oudeyer
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We analyze the key learning mechanisms of ABIG and test it in a 2-dimensional instantiation of the ABP where tasks involve grasping cubes, placing them at a given location, or building various shapes. |
| Researcher Affiliation | Collaboration | Paul Barde Québec AI institute (Mila) Mc Gill University... Tristan Karch Inria Flowers team Universit e de Bordeaux... Christopher Pal Qu ebec AI institute (Mila) Polythechnique Montr eal Service Now Element AI... Pierre-Yves Oudeyer Inria Flowers team Univ. Bordeaux Microsoft Research Montreal |
| Pseudocode | Yes | The algorithm is illustrated in Figure 3 and the pseudo-code is reported in Algorithm 1 in Suppl. Section A.3. |
| Open Source Code | Yes | We ensure the reproducibility of the experiments presented in this work by providing our code3. 3https://github.com/flowersteam/architect-builder-abig.git |
| Open Datasets | No | The paper describes 'Build World' as a custom 2D construction gridworld environment where experiments are conducted and data is generated through agent interaction. It does not provide a link, DOI, or specific citation for this 'dataset' as a pre-collected, publicly available resource. |
| Dataset Splits | Yes | The data-set is split into training (70%) and validation (30%) sets. |
| Hardware Specification | Yes | A complete ABIG training can take up to 48 hours on a single modern CPU (Intel E5-2683 v4 Broadwell @ 2.1GHz). |
| Software Dependencies | No | The paper mentions 'ReLu networks' and 'Adam optimizer' but does not provide specific version numbers for software dependencies or programming languages used in the implementation. |
| Experiment Setup | Yes | All models are parametrized by two-hidden layer 126-units feedforward Re Lu networks. BC minimizes the cross-entropy loss with Adam optimizer (Kingma & Ba, 2015). Tables 1-5 provide detailed hyper-parameters for toy experiments, MCTS, Build World, and BC for both architect and builder. |