Interpretable Actions: Controlling Experts with Understandable Commands
Authors: Shumeet Baluja, David Marwood, Michele Covell4912-4922
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through three tasks, we concretely demonstrate how our system yields readily understandable commands. In one, we introduce a new form of artistic style transfer, learning to draw and color with crayons, in which the transformation of a photograph or painting occurs not as a single monolithic computation, but by the composition of thousands of individual, visualizable strokes. The other two tasks, single-pass function approximation with arbitrary bases and shape-based synthesis, show how our approach produces understandable and extractable actions in two disparate domains. We test the system on 1,000 randomly generated functions that were not used in training; in general all are poorly behaved, e.g. non-monotonic, non-periodic, and discontinuous. They include trigonometric functions other than the 5 bases. See Table 1-Line 1. |
| Researcher Affiliation | Industry | Shumeet Baluja, David Marwood, Michele Covell Google Research, Google, Inc. {shumeet,marwood,covell}@google.com |
| Pseudocode | No | The paper includes a "Overview of 5 Step Procedure" (Figure 2) and detailed steps in the text, but these are descriptive and not formatted as formal pseudocode or an algorithm block. |
| Open Source Code | No | The paper cites "Clark and Contributors 2021. Pillow 8.12, Python Imaging Library (Fork). https://pypi.org/project/Pillow/". This is a citation to a third-party library used, not a release of the authors' own source code for their methodology. |
| Open Datasets | Yes | The training examples for the controller are randomly generated functions employing arbitrary combinations of the 5 external bases, as well as a number of other randomly chosen trigonometric functions. The controller training used 64 64 Image Net (Deng et al. 2009) photos as inputs and targets. |
| Dataset Splits | No | The paper mentions training data and test data ("We test the system on 1,000 randomly generated functions that were not used in training"; "Testing was conducted on a held-out set of 1000 Image Net photographs"), but it does not explicitly describe a separate validation set or the specific percentages/counts for a train/validation/test split. |
| Hardware Specification | Yes | The generator networks (Figure 13) were trained in parallel on a 56 processor Intel Xeon E5-2690 V4 (non-gpu) machine. The controller net was trained on a P100 GPU for 3+ days depending on the number of generators used. The controller network, was trained on a P100 GPU. |
| Software Dependencies | No | The training is done in Tensorflow; a full description of the training regime and architectures is given in the Appendix. The paper mentions TensorFlow but does not specify a version number. It also cites "Pillow 8.12" as an external rendering engine, but this is not listed as a software dependency for running their own experimental code. |
| Experiment Setup | Yes | Learning rate = 10^-4. We trained it for 24 hours to improve results. Acceptable performance was achieved in 2 days, but was allowed to train for 1 week. |