Data Types: Introduction
Back in 1946, a gentleman Psychologist, by the name of Stanley Smith Stevens, determined that there were four distinct data types: ratio, interval, ordinal, and categorical/nominal. These data types are a way to categorize different variables that one might run into, and also give one direction as to how one should analyze those variables. Let’s check them out!
We use this type of data to label, or categorize, things. For instance, numbering people based on their political party affiliation (e.g., 42 Democrats, 46 Republicans, 12 Green Party), or gender (e.g., 3 boys, 3 girls). In this case, the numerical distance between the data points doesn’t tell us anything, and neither does the order. For example, being identified as the #41 Democrat doesn’t mean anything more or less than being identified as the #42 Democrat. Both occupy the same value, simply because they are both in the same category.
Because of these traits, we can only compute two types of analyses for categorical/nominal data: mode and frequency. The mode tells us the category which has the highest rate of occurrence in a sample. In the political party affiliation example above, we could define that the “Republican” party has the highest mode in the sample we studied. The frequency tells us the count for each category. For instance, the fact that there are 42 Democrats.
We use this type of data when we are ranking, or ordering, things. For instance, ordinal data corresponds with things like where competitors place in a race. The numerical distance between the data points (e.g., first and second place) doesn’t mean anything. That is, the difference between first and second place on the podium doesn’t tell us anything about how much faster one person was than the other. It only tells us that one came before (or after) the other.
Like nominal/categorical data, there are limitations to the types of analyses that can be conducted with ordinal data. However, in addition to calculating the frequency of any given value, ordinal values allow us to generate certain measures of central tendency like percentiles and medians.
We use interval data when we care about the numerical difference between data points, but don’t have a true zero point as a shared reference for both sets of data. For example, let’s say your two children get sick and you want to compare their temperatures. Cindy has a temperature of 103 degrees Fahrenheit, while Johnny has a temperature of (approximately) 98 degrees Fahrenheit. Interval data allows us to say that Cindy is 5 degrees fahrenheit hotter than Johnny. However, it doesn’t permit us to say that Cindy is 105.1% warmer than Johnny.
This is because the fahrenheit scale doesn’t have a true zero point — meaning, a point on the scale in which there is no possible lower point. Remember, fahrenheit can go into negative values too. If, however, we were to measure Cindy and Johnny’s temperature in Kelvins instead — which does have a true zero point — we could reference the two temperatures directly against one another (but this would no longer be interval data!).
The big bonus associated with interval data is that we can now calculate all the primary measures of central tendency, which is something crucial in conducting usability research. Specifically, interval data allows us to calculate average scores — something that is of particular relevance in within and between subject designs.
We use ratio data when there is a true zero point. Things like time, or number of errors on a task, are examples of ratio data. Since this type of data starts from an actual zero starting point, we can compute ratios based on them. For example, we can say Participant 1 made 1.5 times (or 50%) more errors than Participant 2.
What does all of this mean? It means that we will have treat these data types differently during analysis. Here is a handy table to remind you what you can do to analyze the different data types: