Showing versus doing: Teaching by demonstration

Authors: Mark K. Ho, Michael Littman, James MacGlashan, Fiery Cushman, Joseph L. Austerweil

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In two experiments, we show that human participants modify their teaching behavior consistent with the predictions of our model. Further, we show that even standard IRL algorithms benefit when learning from showing versus doing.
Researcher Affiliation Academia Mark K Ho Department of Cognitive, Linguistic, and Psychological Sciences Brown University Providence, RI 02912 mark_ho@brown.edu Michael L. Littman Department of Computer Science Brown University Providence, RI 02912 mlittman@cs.brown.edu James Mac Glashan Department of Computer Science Brown University Providence, RI 02912 james_macglashan@brown.edu Fiery Cushman Department of Psychology Harvard University Cambridge, MA 02138 cushman@fas.harvard.edu Joseph L. Austerweil Department of Psychology University of Wisconsin-Madison Madison, WI 53706 austerweil@wisc.edu
Pseudocode Yes Algorithm 1 Pedagogical Trajectory Algorithm
Open Source Code No The paper does not provide an explicit statement or link to open-source code for the methodology described within it.
Open Datasets No The paper describes experiments with human participants whose trajectories form the data. It does not mention using or providing access to a publicly available dataset for training models or analysis.
Dataset Splits No The paper does not provide specific information about training, validation, or test dataset splits for reproducibility.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments.
Software Dependencies No The paper describes algorithms and models (e.g., Maximum Likelihood IRL) but does not list specific software dependencies with version numbers, such as Python versions or library versions.
Experiment Setup Yes Model trajectories are the two with the highest probability (λ = 2, α = 1.0, pmin = 10 6, lmax = 4).Sixty Amazon Mechanical Turk participants performed the task in Figure 1.Safe tiles were worth 0 points, dangerous tiles were worth -2 points, and the terminal goal tile was worth 5 points. They also won an additional 5 points for each round completed for a total of 10 points. Each point was worth 2 cents of bonus.We constrained non-goal feature weights to be non-positive.