In 1988, Sandra G. Hart of NASA’s Human Performance Group and Lowell E. Staveland of San Jose State University introduced the Task Load Index. While its original intended use was perhaps aviation alone, the NASA-TLX is now considered one of the most widely accepted measures for subjective workload. Having been cited in more than 8,000 studies since 1988, it has spread far beyond its original application of aviation, focus, and the English language (Hart, 2006).
The NASA-TLX is a tool that estimates one or more users’ perceived cognitive demand, which can be useful in understanding a system’s usability, effectiveness, or comfort. Of particular interest are the “immediate, often unverbalized impressions that occur spontaneously” (Hart & Staveland, 1988), those that are either difficult or impossible to observe objectively. Researchers simply cannot get the information we want through observation.
The NASA-TLX is a multi-dimensional scale. There are six sub-scales that eventually comprise one overall NASA-TLX score (Hart & Staveland, 1988):
- Mental- How much mental and perceptual activity was required? Was the task easy or demanding, simple or complex, exacting or forgiving?
- Physical- How much physical activity was required? Was the task easy or demanding, slow or brisk, slack or strenuous, restful or laborious?
- Temporal- How much time pressure did you feel due to the rate or pace at which the tasks or task elements occurred? Was the pace slow and leisurely or rapid and frantic?
- Frustration- How insecure, discouraged, irritated, stressed, and annoyed versus secure, gratified, content, relaxed and complacent did you feel during the task?
- Effort- How hard did you have to work (mentally and physically) to accomplish your level of performance?
- Performance- How successful do you think you were in accomplishing the goals of the task set by the experimenter (or yourself)? How satisfied were you with your performance in accomplishing these goals?
Noticeably, there are multiple perspectives from which a user may interpret the word “workload”. The original idea was that some combination of these perspectives are likely to represent “most people” and “most scenarios” in one way or another (Hart, 2006). To account for that intended flexibility, a weighting scheme was imposed to ensure the dimensions most critical to each user are emphasized.
Modifications to the NASA-TLX
There have been various attempts to modify the NASA-TLX. Some studies add sub-scales while others delete them. While this is commendable, this does “require establishing validity, sensitivity, and reliability of the new instrument before using it” (Hart, 2006).
The most common adaptation is known as the Raw TLX (RTLX), in which the weighting process is omitted entirely. The RTLX is attractive for obvious reasons: the overall workload estimate is as simple as combining the scores of each sub-scale. No calculations besides a plain sum are necessary (Hart, 2006).
Naturally, the popularization and use of the RTLX was met with some criticism until various studies compared it to the original Task Load Index. In an amusing summary of these 29 studies, Hart writes that the RTLX was found to be “either more sensitive (Hendy, Hamilton, & Landry, 1993), less sensitive (Liu & Wickens, 1994), or equally sensitive (Byers, Bittner, Hill, 1989), so it seems you can take your pick” (Hart, 2006).
Valid: The NASA-TLX and its sub-scales sufficiently represent sources of cognitive workload among different tasks. Not only did Hart and Staveland validate their measure in their 1988 paper, independent studies found the TLX to be a valid measure of subjective workload (Hart & Staveland, 1988; Rubio, et al., 2004; Xiao, et al., 2005).
Applicable to Multiple Domains: The NASA-TLX is applicable to a number of domains. It was originally intended for use in aviation, but quickly spread to air traffic control, civilian and military cockpits, robotics, and unmanned vehicles. In later years, the TLX was found to be used in automotive, healthcare, and technology settings (Hart, 2006).
Diagnostic-ish: The TLX has a previously unexpected strength in that it has some diagnostic abilities. While the original scale produces an aggregate score, the presence of sub-scales help identify where the workload is coming from specifically (Hart, 2006). This feature can be remarkably helpful for developers hoping to improve their design.
Highly Accessible: In addition to having been translated into at least 12 languages, the NASA-TLX can be administered in various “mediums”. The is still popular, but the TLX has been integrated into computer software packages, an iOS app, and Android apps. Each of these options are completely free, which is significant.
Memory: Asking a user to complete a scale during a task can be rather intrusive, for obvious reasons. Unfortunately, waiting until the task is complete can lead to its own set of problems. Especially if the TLX is not administered until the very end of the testing session, users are prone to forget various details of the task. As human memory has been shown to deteriorate on many occasions, time between the task in question and the TLX itself is not ideal.
Task Performance Can Bias Ratings: The user’s perception of his or her own task performance can weigh heavily on all sorts of ratings. If they believe they completed the task, workload ratings tend to be inflated. Likewise, if the user perceives a task failure, workload ratings tend to be lower. Ideally, users would rate each sub-scale the same regardless of task performance.
Subjective: The NASA-TLX is a subjective measure of a user’s perceived cognitive workload, nothing more. It is NOT a measure of the cognitive workload required to use a system, however sensational that would be. It is very important for practitioners employing the TLX to remember exactly what they are measuring and to not treat perceived workload ratings as anything other than what they are.
Hart, S. G. (2006, October). NASA-task load index (NASA-TLX); 20 years later. In Proceedings of the human factors and ergonomics society annual meeting (Vol. 50, No. 9, pp. 904-908). Sage CA: Los Angeles, CA: Sage Publications.
Hart, S. G., & Staveland, L. E. (1988). Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In Advances in psychology (Vol. 52, pp. 139-183). North-Holland.
Hendy, K. C., Hamilton, K. M., & Landry, L. N. (1993). Measuring subjective workload: when is one scale better than many?. Human Factors, 35(4), 579-601.
Hill, S. G., Iavecchia, H. P., Byers, J. C., Bittner Jr, A. C., Zaklade, A. L., & Christ, R. E. (1992). Comparison of four subjective workload rating scales. Human factors, 34(4), 429-439.
Liu, Y., & Wickens, C. D. (1994). Mental workload and cognitive task automaticity: an evaluation of subjective and time estimation metrics. Ergonomics, 37(11), 1843-1854.
Rubio, S., Díaz, E., Martín, J., & Puente, J. M. (2004). Evaluation of subjective mental workload: A comparison of SWAT, NASA‐TLX, and workload profile methods. Applied Psychology, 53(1), 61-86.
Xiao, Y. M., Wang, Z. M., Wang, M. Z., & Lan, Y. J. (2005). The appraisal of reliability and validity of subjective workload assessment technique and NASA-task load index. Chinese journal of industrial hygiene and occupational diseases, 23(3), 178-181.
About the Author
Anders Orn | Human Factors Scientist | Research Collective
As a Human Factors Scientist, Anders Orn plans for and conducts observational research at Research Collective. While he is involved in many aspects of research, Anders enjoys in usability testing in the healthcare and automotive industries as they are a unique opportunity to examine human behavior. You can find Anders on LinkedIn here.