Thorgeirsson, Sverrir; Vahrenhold, Jan
Research article in edited proceedings (conference) | Peer reviewedBackground and Context. Code complexity measures have been used to guide the design of various activities within computing education, such as instructional sequencing and assessment. However, empirical evidence for the link of these measures to actual cognitive difficulties remains mixed, with studies suffering from small sample sizes and non-controlled experimental design. Objectives. We sought to investigate how code complexity measures predict the cognitive load of university students when tracing code and whether their predictive power is moderated by computer science achievement. We also compared how these measures stacked up against the comparative judgment from a 15-member expert panel. Methods. We conducted a preregistered laboratory study to investigate the strength of code complexity measures identified in a recent neuroimaging study as predictors of cognitive load. In this controlled study, 𝑁= 551 university students traced a random selection of 24 expert-curated code snippets in their preferred language (Java, Python, or C++), and then reported their cognitive load using two validated measures of cognitive load. We assessed preregistered hierarchical regression models with respect to the predictive strength of the code complexity measures and possible moderation. Findings. Contrary to the findings from the recent neuroimaging study, we could not confirm data-flow complexity to be the strongest predictor of measured cognitive load; instead, the simple source lines of code measure dominated the other code complexity measures. In contrast, expert ratings strongly predicted cognitive load. Implications. In the educational context studied, measuring source lines of code is a simple and effective heuristic for ordering tracing tasks by difficulty and outperforms more sophisticated efforts involving data and control flow. The unexpected finding from our exploratory analyses that easy-to-obtain rankings based on pairwise-comparison sessions involving experts have a much stronger predictive power than static and dynamic measures opens up avenues for follow-up research.
| Vahrenhold, Jan |