reproducibilityindex.ai

Jelly Bean World: A Testbed for Never-Ending Learning

Authors: Emmanouil Antonios Platanios, Abulhair Saparov, Tom Mitchell

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The goal of this section is to show how the non-episodic, non-stationary, multi-modal, and multi-task aspects of the JBW make it a challenging environment for existing machine learning algorithms, through a few example case studies. For all experiments we use the simulator conﬁguration and item types shown in Tables 2 and 3.3 Due to space, the case studies focus on the single-agent setting. We use different agent models depending on which modalities are used in each experiment. If vision is used, then the visual ﬁeld is passed through a convolution layer with stride 2, 3x3 ﬁlters, and 16 channels, and another one with stride 1, 2x2 ﬁlters, and 16 chan Table 3: Item types. See Section A.2 for details on the functional forms of the intensity and interaction functions.
Researcher Affiliation	Academia	Emmanouil Antonios Platanios , Abulhair Saparov & Tom Mitchell Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA {e.a.platanios,asaparov,tom.mitchell}@cs.cmu.edu
Pseudocode	Yes	Algorithm 1: Pseudocode for the greedy vision-based algorithm.
Open Source Code	Yes	The JBW is written in C++ and we provide C, Python, and Swift APIs, and is available at https://github.com/eaplatanios/jelly-bean-world.
Open Datasets	No	The paper describes a procedurally generated environment, the Jelly Bean World, rather than using a pre-existing, static dataset. Data for experiments is generated dynamically within this environment: 'The map is a procedurally-generated two-dimensional grid.' While the environment itself is open-source, it does not refer to a publicly available 'dataset' in the traditional sense of a fixed collection of samples.
Dataset Splits	No	The paper explicitly states: 'Thus, never-ending learning explicitly removes the distinction between training and testing that is common to many other classical machine learning paradigms.' It does not describe any specific training, validation, or test data splits.
Hardware Specification	Yes	As a rough indication of performance, on a single core of an Intel Core i7 5820K (released in 2014) at 3.5GHz, the JBW can generate 8.56 patches per second, each of size 64x64 (i.e., 35,062 grid cells), using the conﬁguration described in Section 4.
Software Dependencies	No	The paper mentions: 'The experiments are implemented using Swift for Tensor Flow.' and 'We provide implementations of the JBW environments for Open AI Gym (Brockman et al., 2016) in Python and for Swift RL (Platanios, 2019) in Swift.' However, it does not provide specific version numbers for Swift, TensorFlow, OpenAI Gym, or Swift RL.
Experiment Setup	Yes	For all experiments we use the simulator conﬁguration and item types shown in Tables 2 and 3. ... If vision is used, then the visual ﬁeld is passed through a convolution layer with stride 2, 3x3 ﬁlters, and 16 channels, and another one with stride 1, 2x2 ﬁlters, and 16 channels. The resulting tensor is ﬂattened and passed through a dense layer with size 512. If scent is used, then the scent vector is passed through two dense layers: one with size 32, and one with size 512. If both modalities are being used, the two hidden representations are concatenated. Finally, the result is processed by a Long Short-Term Memory (LSTM) network (Hochreiter & Schmidhuber, 1997) which outputs a value for the agent s current state, along with a distribution over actions. Learning is performed using Proximal Policy Optimization (PPO); a popular on-policy reinforcement learning algorithm proposed by Schulman et al. (2017).