reproducibilityindex.ai

Learning Parameterized Task Structure for Generalization to Unseen Entities

Authors: Anthony Liu, Sungryull Sohn, Mahdi Qazwini, Honglak Lee7534-7541

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate PSGI in novel symbolic environments, AI2Thor, Cooking, and Mining. AI2Thor is a symbolic environment based on (Kolve et al. 2017), a simulated realistic indoor environment. In our AI2Thor environment, the agent is given a set of pre-trained options and must cook various food objects in different kitchen layouts, each containing possibly unseen objects. Cooking is a simpliﬁed cooking environment with similar but simpler dynamics to AI2Thor.The Mining domain is modelled after the open world video game Minecraft and the domain introduced by Sohn, Oh, and Lee (2018). Tasks. In AI2Thor, there are 30 different tasks based on the 30 kitchen ﬂoorplans in (Kolve et al. 2017). In each task, 14 entities from the ﬂoorplan are sampled at random. Then, the subtasks and options are populated by replacing the parameters in parameterized subtasks and options by the sampled entities; e.g., we replace X and Y in the parameterized subtask (pickup, X, Y) by {apple, cabbage, table} to populate nine subtasks. This results in 1764 options and 526 subtasks. The groundtruth attributes are taken from (Kolve et al. 2017) but are not available to the agent. Cooking is deﬁned similarly and has a pool of 22 entities and 10 entities are chosen at random for each task. This results in 324 options and 108 subtasks. Similarly for Mining, we randomly sample 12 entities from a pool of 18 entities and populate 180 subtasks and 180 options for each task. In each environment, the reward is assigned at random to one of the subtasks that have the largest critical path length, where the critical path length is the minimum number of options to be executed to complete each subtask. See the appendix for more details on the tasks. Observations. At each time step, the agent observes the completion and eligibility vectors (see section 2 for deﬁnitions) and the corresponding embeddings. The subtask and option embeddings are the concatenated vector of the embeddings of its entities; e.g., for pickup, apple the embedding is [f(pickup), f(apple))] where f( ) can be an image or language embeddings. In our experiments, we used 50 dimensional Glo VE word embeddings (Pennington, Socher, and Manning 2014) as the embedding function f( ).
Researcher Affiliation	Collaboration	1 University of Michigan 2 LG AI Research
Pseudocode	No	The paper describes the methods in prose and mathematical formulations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	An implementation of PSGI and experiments is available at https://github.com/anthliu/PSGI
Open Datasets	Yes	AI2Thor is a symbolic environment based on (Kolve et al. 2017)... The groundtruth attributes are taken from (Kolve et al. 2017) but are not available to the agent.
Dataset Splits	No	The paper describes meta-training on 'training tasks' and meta-evaluation on 'unseen test tasks' in a transfer RL setup, implying dynamic task generation rather than fixed dataset splits for training, validation, and testing. It does not provide specific percentages or counts for data splits.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments (e.g., GPU models, CPU types, memory specifications).
Software Dependencies	No	The paper mentions using '50 dimensional Glo VE word embeddings' and refers to GRProp, but it does not specify version numbers for any software components, libraries, or programming languages used for implementation.
Experiment Setup	No	The paper discusses the setup of the environments (AI2Thor, Cooking, Mining) and how tasks are populated, and mentions using a 'recurrent neural network with self-attention-mechanism'. However, it does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, number of epochs) or specific optimizer settings in the main text.