Learning to Interactively Learn and Assist

Authors: Mark Woodward, Chelsea Finn, Karol Hausman2535-2543

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through a series of experiments, we demonstrate the emergence of a variety of interactive learning behaviors, including information-sharing, information-seeking, and question-answering.
Researcher Affiliation Industry Mark Woodward, Chelsea Finn, Karol Hausman Google Brain, Mountain View {markwoodward, chelseaf, karolhausman}@google.com
Pseudocode No The paper describes the training procedure and models in prose and mathematical equations but does not include formal pseudocode blocks or algorithm listings.
Open Source Code No An interactive game and videos for all experiments are available at: https://interactive-learning.github.io. This link directs to a project website which states that the code will be open-sourced later, rather than providing direct access to the code for the described methodology.
Open Datasets No The paper uses custom simulated grid-world environments and object gathering task domains, but does not mention the use of any publicly available datasets or provide access information for the data generated.
Dataset Splits No The paper describes using 100 test tasks 'not seen during training' but does not explicitly mention a separate validation set or provide details on how the data was split into train/validation/test sets.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, TensorFlow x.x, PyTorch x.x).
Experiment Setup Yes The training batch size was 100 episodes and the models were trained for 150,000 gradient steps (Experiments 1-3) or 40,000 gradient steps (Experiment 4). Table 1 gives the setup for each experiment, including GRID SHAPE, NUM. OBJECTS, OBSERVATIONS, and OBSERVATION WINDOW. Actions were chosen ϵ-greedy, and hidden states for the recurrent LSTM cells are reset to 0 at the start of each episode.