How to talk so AI will learn: Instructions, descriptions, and autonomy
Authors: Theodore Sumers, Robert Hawkins, Mark K. Ho, Tom Griffiths, Dylan Hadfield-Menell
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our models with a behavioral experiment, demonstrating that (1) our speaker model predicts human behavior, and (2) our pragmatic listener successfully recovers humans reward functions. |
| Researcher Affiliation | Academia | Theodore R. Sumers Computer Science Princeton University sumers@princeton.edu Robert D. Hawkins Princeton Neuroscience Institute Princeton University rdhawkins@princeton.edu Mark K. Ho Computer Science Princeton University mho@princeton.edu Thomas L. Griffiths Computer Science, Psychology Princeton University tomg@princeton.edu Dylan Hadfield-Menell EECS, CSAIL MIT dhm@csail.mit.edu |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | Code and data are available at https://github.com/tsumers/how-to-talk. |
| Open Datasets | Yes | Code and data are available at https://github.com/tsumers/how-to-talk. |
| Dataset Splits | No | The paper describes calibrating model parameters (e.g., "To calibrate our pragmatic listeners, we tested βS1 [1, 10] and found that βS1 = 3 optimized Known H and Latent H listeners"), but does not explicitly provide training/validation/test splits for the human behavioral dataset collected in the experiment to enable reproduction of data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments or simulations. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | To calibrate our pragmatic listeners, we tested βS1 [1, 10] and found that βS1 = 3 optimized Known H and Latent H listeners (see Appendix B.3 for details)." and "we fix βL0 = 3 throughout this work". |