Capturing Difficulty Expressions in Student Online Q&A Discussions
Authors: Jaebong Yoo, Jihie Kim
AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce a new application of online dialogue analysis: supporting pedagogical assessment of online Q&A discussions. Extending the existing speech act framework, we capture common emotional expressions that often appear in student discussions, such as frustration and degree of certainty, and present a viable approach for the classification. We demonstrate how such dialogue information can be used in analyzing student discussions and identifying difficulties. In particular, the difficulty expressions are aligned to discussion patterns and student performance. We found that frustration occurs more frequently in longer discussions. The students who frequently express frustration tend to get lower grades than others. On the other hand, frequency of high certainty expressions is positively correlated with the performance. We expect such dialogue analyses can become a powerful assessment tool for instructors and education researchers. |
| Researcher Affiliation | Collaboration | Jaebong Yoo Jihie Kim Samsung Electronics Suwon, South Korea jaebong.yoo@samsung.com KT and USC / Information Sciences Institute 4676 Admiralty Way, Marina del Rey, CA USA jihie@isi.edu |
| Pseudocode | No | The paper describes data processing steps and the use of classifiers (J48, Naïve Bayes, SVM), but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or links to a code repository for the methodology described. |
| Open Datasets | No | The data collected from eight semesters of this course include 5,056 messages and 1,532 threads from 370 users (180 groups). This data was collected from an undergraduate course discussion board at the University of Southern California and is not stated to be publicly available. |
| Dataset Splits | No | For building the classifiers, we randomly divided 418 threads (1,841 posts) into two datasets: 318 threads (1,404 posts) for training and 100 threads (437 posts) for testing. While the training phase used 10-fold cross validation, an explicit separate validation split with defined percentages or sample counts for the overall dataset is not provided. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory used for running its experiments. |
| Software Dependencies | No | The paper states: "We then used J48, Naïve Byes, and SVM with RBF in the WEKA package." However, it does not specify version numbers for WEKA or the classifiers used within it. |
| Experiment Setup | Yes | We selected top 2,000 out of 19,465 features generated from the training corpus. ... Thus, we performed resampling of the training data toward more balanced distribution: duplicating positive examples and random sampling negative ones. ... We then used J48, Naïve Byes, and SVM with RBF in the WEKA package. The training phase was carried out based on 10-fold cross validation. |