Jelly Bean World: A Testbed for Never-Ending Learning
Authors: Emmanouil Antonios Platanios, Abulhair Saparov, Tom Mitchell
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The goal of this section is to show how the non-episodic, non-stationary, multi-modal, and multi-task aspects of the JBW make it a challenging environment for existing machine learning algorithms, through a few example case studies. For all experiments we use the simulator configuration and item types shown in Tables 2 and 3.3 Due to space, the case studies focus on the single-agent setting. We use different agent models depending on which modalities are used in each experiment. If vision is used, then the visual field is passed through a convolution layer with stride 2, 3x3 filters, and 16 channels, and another one with stride 1, 2x2 filters, and 16 chan Table 3: Item types. See Section A.2 for details on the functional forms of the intensity and interaction functions. |
| Researcher Affiliation | Academia | Emmanouil Antonios Platanios , Abulhair Saparov & Tom Mitchell Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA {e.a.platanios,asaparov,tom.mitchell}@cs.cmu.edu |
| Pseudocode | Yes | Algorithm 1: Pseudocode for the greedy vision-based algorithm. |
| Open Source Code | Yes | The JBW is written in C++ and we provide C, Python, and Swift APIs, and is available at https://github.com/eaplatanios/jelly-bean-world. |
| Open Datasets | No | The paper describes a procedurally generated environment, the Jelly Bean World, rather than using a pre-existing, static dataset. Data for experiments is generated dynamically within this environment: 'The map is a procedurally-generated two-dimensional grid.' While the environment itself is open-source, it does not refer to a publicly available 'dataset' in the traditional sense of a fixed collection of samples. |
| Dataset Splits | No | The paper explicitly states: 'Thus, never-ending learning explicitly removes the distinction between training and testing that is common to many other classical machine learning paradigms.' It does not describe any specific training, validation, or test data splits. |
| Hardware Specification | Yes | As a rough indication of performance, on a single core of an Intel Core i7 5820K (released in 2014) at 3.5GHz, the JBW can generate 8.56 patches per second, each of size 64x64 (i.e., 35,062 grid cells), using the configuration described in Section 4. |
| Software Dependencies | No | The paper mentions: 'The experiments are implemented using Swift for Tensor Flow.' and 'We provide implementations of the JBW environments for Open AI Gym (Brockman et al., 2016) in Python and for Swift RL (Platanios, 2019) in Swift.' However, it does not provide specific version numbers for Swift, TensorFlow, OpenAI Gym, or Swift RL. |
| Experiment Setup | Yes | For all experiments we use the simulator configuration and item types shown in Tables 2 and 3. ... If vision is used, then the visual field is passed through a convolution layer with stride 2, 3x3 filters, and 16 channels, and another one with stride 1, 2x2 filters, and 16 channels. The resulting tensor is flattened and passed through a dense layer with size 512. If scent is used, then the scent vector is passed through two dense layers: one with size 32, and one with size 512. If both modalities are being used, the two hidden representations are concatenated. Finally, the result is processed by a Long Short-Term Memory (LSTM) network (Hochreiter & Schmidhuber, 1997) which outputs a value for the agent s current state, along with a distribution over actions. Learning is performed using Proximal Policy Optimization (PPO); a popular on-policy reinforcement learning algorithm proposed by Schulman et al. (2017). |