In 1988, Sandra G. Hart of NASA’s Human Performance Group and Lowell E. Staveland of San Jose State University introduced the Task Load Index. With more than 8,000 citations since 1988, it has spread far beyond its original application of aviation, focus, and the English language (Hart, 2006).
The NASA-TLX estimates one or more users’ perceived cognitive demand, which can help gauge a system’s usability, effectiveness, or comfort. The “immediate, often unverbalized impressions that occur spontaneously” (Hart & Staveland, 1988) are of particular interest, as they are either difficult or impossible to observe objectively. Researchers simply cannot get the information they want through observation.
The NASA-TLX is a multi-dimensional scale. Meaning, there are six sub-scales that eventually comprise one overall NASA-TLX score (Hart & Staveland, 1988):
- Mental- How much mental and perceptual activity was required? Was the task easy or demanding, simple or complex, exacting or forgiving?
- Physical- How much physical activity was required? Was the task easy or demanding? Slow or brisk? Slack or strenuous? Restful or laborious?
- Temporal- How much time pressure did you feel due to the rate or pace at which the tasks or task elements occurred? Was the pace slow and leisurely or rapid and frantic?
- Frustration- How insecure, discouraged, irritated, stressed, and annoyed versus secure, gratified, content, relaxed and complacent did you feel during the task?
- Effort- How hard did you have to work (mentally and physically) to accomplish your level of performance?
- Performance- How successful do you think you were in accomplishing the goals of the task set by the experimenter (or yourself)? What is your level of satisfaction?
Noticeably, there are multiple perspectives from which a user may interpret the word “workload”. Originally, Hart assumed that some combination of these perspectives is likely to represent “most people” and “most scenarios” (Hart, 2006). To account for that intended flexibility, Hart imposed a weighting scheme to ensure the dimensions most critical to each user are emphasized.
Modifications to the NASA-TLX
There have been various attempts to modify the NASA-TLX. Some studies add sub-scales while others delete them. This is commendable, but it does “require establishing validity, sensitivity, and reliability of the new instrument before using it” (Hart, 2006).
The most common adaptation is known as the Raw TLX (RTLX), in which the weighting process is omitted entirely. The RTLX is attractive for obvious reasons: the overall workload estimate is as simple as combining the scores of each sub-scale. In other words, no calculations besides a plain sum are necessary (Hart, 2006).
Until various studies compared it to the original TLX, popularization and use of the RTLX was criticized. In an amusing summary of these 29 studies, Hart writes that the RTLX was found to be “either more sensitive (Hendy, Hamilton, & Landry, 1993), less sensitive (Liu & Wickens, 1994), or equally sensitive (Byers, Bittner, Hill, 1989), so it seems you can take your pick” (Hart, 2006).
The NASA-TLX and its sub-scales sufficiently represent sources of cognitive workload among different tasks. Not only did Hart and Staveland validate their measure in their 1988 paper, independent studies found the TLX to be a valid measure of subjective workload (Hart & Staveland, 1988; Rubio, et al., 2004; Xiao, et al., 2005).
Applicable to Multiple Domains:
The NASA-TLX is applicable to a number of domains. Originally, it was intended for use in aviation, but quickly spread to air traffic control, civilian and military cockpits, robotics, and unmanned vehicles. In later years, studies in the automotive, healthcare, and technology domains used the TLX (Hart, 2006).
The TLX has a previously unexpected strength in that it has some diagnostic abilities. While the original scale produces an aggregate score, the presence of sub-scales help identify where the workload is coming from specifically (Hart, 2006). This feature can be remarkably helpful for developers hoping to improve their design.
In addition to having been translated into at least 12 languages, the NASA-TLX can be administered in various “mediums”. Paper and pencil is still popular, but the TLX has been integrated into computer software packages, an iOS app, and Android apps. Each of these options are completely free, which is significant.
Asking a user to complete a scale during a task can be rather intrusive. Unfortunately, waiting until the task is complete can lead to its own set of problems. Users are prone to forget various details of the task. As human memory has been shown to deteriorate on many occasions, time between the task in question and the TLX itself is not ideal.
Task Performance Can Bias Ratings:
The user’s perception of his or her own task performance can weigh heavily on all sorts of ratings. If they believe they completed the task, workload ratings tend to be inflated. Similarly, if the user perceives a task failure, workload ratings tend to be lower. Ideally, users would rate each sub-scale the same regardless of task performance.
The NASA-TLX is a subjective measure of a user’s perceived cognitive workload, nothing more. Importantly, it is NOT a measure of the cognitive workload required to use a system, however sensational that would be. Practitioners employing the TLX must remember exactly what they are measuring, nor should they treat perceived workload ratings as anything other than what they are.
Hart, S. G. (2006, October). NASA-task load index (NASA-TLX); 20 years later. In Proceedings of the human factors and ergonomics society annual meeting (Vol. 50, No. 9, pp. 904-908). Sage CA: Los Angeles, CA: Sage Publications.
Hart, S. G., & Staveland, L. E. (1988). Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In Advances in psychology (Vol. 52, pp. 139-183). North-Holland.
Hendy, K. C., Hamilton, K. M., & Landry, L. N. (1993). Measuring subjective workload: when is one scale better than many?. Human Factors, 35(4), 579-601.
Hill, S. G., Iavecchia, H. P., Byers, J. C., Bittner Jr, A. C., Zaklade, A. L., & Christ, R. E. (1992). Comparison of four subjective workload rating scales. Human factors, 34(4), 429-439.
Liu, Y., & Wickens, C. D. (1994). Mental workload and cognitive task automaticity: an evaluation of subjective and time estimation metrics. Ergonomics, 37(11), 1843-1854.
Rubio, S., Díaz, E., Martín, J., & Puente, J. M. (2004). Evaluation of subjective mental workload: A comparison of SWAT, NASA‐TLX, and workload profile methods. Applied Psychology, 53(1), 61-86.
Xiao, Y. M., Wang, Z. M., Wang, M. Z., & Lan, Y. J. (2005). The appraisal of reliability and validity of subjective workload assessment technique and NASA-task load index. Chinese journal of industrial hygiene and occupational diseases, 23(3), 178-181.
About the Author
Anders Orn | Human Factors Scientist | Research Collective
As a Human Factors Scientist, Anders Orn plans for and conducts observational research at Research Collective. While he is involved in many aspects of research, Anders enjoys in usability testing in the healthcare and automotive industries as they are a unique opportunity to examine human behavior. You can find Anders on LinkedIn here.