Teaching Big Data Analytics Skills with Intelligent Workflow Systems

Authors: Yolanda Gil

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The course was pre-tested in the Summer of 2015 with four students, three are non-CS undergraduates and one is a high-school student. The students were able to follow the materials, and used basic programming skills from intro courses they had taken (one student learned R on her own) and developed new workflows for basic statistical analysis of data, for image processing (using the Open CV open source package), and for basic social network analysis. ... We illustrate some of the materials created by the students doing tasks equivalent to homework assignments. Figure 4 shows an example of a logic constraint created by one of the students. ... Figure 5 shows an example of a workflow created by a student using Open CV components.
Researcher Affiliation Academia Yolanda Gil Information Sciences Institute and Department of Computer Science University of Southern California 4676 Admiralty Way Marina del Rey CA 90292 gil@isi.edu
Pseudocode No The paper describes a workflow system and shows diagrams of workflows (Figure 1, Figure 2, Figure 3) but does not provide any structured pseudocode or algorithm blocks.
Open Source Code No The paper references the WINGS semantic workflow system at http://www.wings-workflows.org and mentions using the Open CV open source package. However, it does not state that the code for the specific teaching methodology or the workflows developed for the course, which are the subject of *this paper's* work, are open-source or available with a link.
Open Datasets No The paper mentions 'real-world and science-grade datasets' and 'Web documents [Hauder et al 2011b]', but it does not specify concrete, publicly available datasets with a link, DOI, repository name, or formal citation for replication. It states: 'Students will work with multiple domains and use workflows that capture end-to-end expert-level analytic methods.' without providing details on the datasets themselves.
Dataset Splits No The paper describes a pre-test conducted with four students but does not provide any specific information about training, validation, or test dataset splits for any data used within the workflows or for evaluating the teaching methodology.
Hardware Specification No The paper mentions 'high-end computing' and a 'Hadoop (Map Reduce) infrastructure' for workflow execution, but it does not provide any specific hardware details such as CPU/GPU models, memory configurations, or cloud instance types used for running experiments or the WINGS system.
Software Dependencies No The paper mentions software packages like 'MATLAB and R', 'Weka and Cluto', 'Mallet', and 'Open CV', and programming languages like 'Java, C++, and Python'. However, it does not specify any version numbers for these software components.
Experiment Setup No The paper describes the setup of a course and the use of a workflow system, providing high-level conceptual details and examples of student work (like a semantic constraint in Figure 4). However, it does not include specific experimental setup details such as hyperparameters, learning rates, batch sizes, or other system-level training configurations typically found in papers describing machine learning model training or other computational experiments.