“Daddy, what is your job? What do you do there??”, asks my 5 year old in a curious tone as he tries to figure out why I went to work today instead of staying home to play with him.
“Weeeell…” (not a confident start to my response, I know…).
“Iiiiii…help people with data.” 😬
“Data? What’s data??”
[Think, Andrew! Think! … Iiit’s the… recorded facts (numbers and such) that we use to interperet reality… no… sratch that. It’s the observations we made about a process or event that we bothered to write in our notebook or store in our computer… man, this is hard…]
“Hey! Wanna go ride bikes??”
Yeah, I know. Lame.
But it got me thinking. How could I have explained data to my 5 year old?
Better yet, could that explanation be valuable to someone older than a 5 year old?
Data. Where do I start?
Let’s take a walk…
Imagine with me that we’re on a bug hunt (c’mon, you’re 5, remember?). We’ve just unearthed a spectacular rock. What’s underneath? A mother lode of bugs, of course!
“Quick!”, I say, “Let’s get out our handy dandy notebook!” (as I channel Blues Clues).
In the notebook, we write down a bunch of things:
- The names of each kind of bug we saw
- The colors of each one
- How fast (slow, medium, speedy) each one moved
- How many we counted
We snag a few bugs to take back to “the lab” (the dining room, of course) to inspect more deelpy.
On our “lab bench” (the dining room table, of course), we pick a beetle, and try to hold him down so he can’t move. That way we can take a few more measurements for our notebook:
- How heavy it is
- Number of legs
- How long the beetle is
- How wide it is
We bust out the magnifying glass and observe a bit more:
- It has wings (physical trait)
- It has antennas (another physical trait)
“All right, beetle back in the bug box! Next specimen!”
This one’s a rolly polly. Same routine:
- How heavy
- Number of legs (asking a rolly polly to be still so we can count is unrealistic, so we took a picture and counted what we saw in the photo)
- How long
- How wide
Again with the magnifying glass:
- No wings on this one
- Has antennas though!
Here’s the point of this riveting adventure: Everything we wrote down in the notebook? Data.
We looked at something we saw in the world and jotted down what we noticed.
We put in to words or measured in numbers what we noticed about…bugs, in this case.
Names (kinds, species), color, slow/medium/fast. Has wings (or not), Has antennas (or not).
How many bugs total, how heavy, how many legs on each, how long, how wide.
Each word and each number that we bothered to write down is data.
Data is everywhere…at least, the potential for there to be data is everywhere.
I suppose it’s not actually data until it’s “captured” and saved so that we can look at it again later.
Anytime something happens in the world and we succeed at saving a description that event, what we saved is data.
I think we’re making some progress!
Two Kinds of Data
I wonder if you noticed the way I grouped a few of the observations we made about the bugs.
Sometimes we used words to describe what we saw: “Black”, “Red”, “Has Antennas”, “Slow”.
Other times we used numbers: “1” (gram), “29.2” (millimeters), “100” (legs)
Let’s use some words split up the two kinds:
- Quantitative (Numerical)
- Qualitative (Categorical)
“Numerical” and “number” go together pretty well. It’s that “Q” word, “quantitative”, that gives people trouble sometimes.
If you listen for the word “quantity” in the word “quantitative”, that could give you the hint for putting the term together with numbers.
In general, quantitative/numerical data describes “how much-ness” or “how many-ness” of something we notice about the world.
If you can measure it, count it, weigh it… if you can add up the things you wrote down, or do other “math-y” things to the data, then you’ve got quantitative/numerical data.
One way to look at it is that it’s anything that’s not quantitative/numerical, since there are only two overall kinds of data. That’s not so helpful for understanding the term though.
Let’s listen to the words again…
Do you hear “quality” in “qualitative” or “category” in “categorical””?
When we describe the qualities of something, what we’re noticing are its features. The bugs can be red or black or gray. They can be fast or slow. They can have wings (or not)… have antennas (or not).
It’s as if you can split them up into groups, or categories, based on how similar they are to one another…. winged bugs in this box… black winged ones in this compartment, and red winged ones in this other. Slow black winged bugs in this sub-compartment; fast black winged bugs in this other. It’s easy to group qualitative/categorical data like this.
That’s one way to think about what makes qualitative/categorical data qualitative (or categorical).
Another way involves math: We all know what to do with 1 + 1, right? We add it up, and get 2 as the answer.
Here’s one for you though: What about “black” + “red”? Now, I mean… if you said “dark red”, I can see what you did there, but we typically don’t think about doing math with words, do we?
“Rolly Polly” + “Beetle” doesn’t make much sense, does it…
“Has antennas” divided by “slow” equals …well… nonsense, right?
Rule of Thumb: Quantitative? or Qualitative? … Numerical? or Categorical?
So here’s the rule of thumb: If you can do math with the data, it’s quantitative/numerical, rather than qualitative/categorical.
Word-y data is the kind of data that ends up being qualitative/categorical in nature.
Numbers Can Be Qualitative/categorical
Notice what I said in that “rule of thumb” statement: If you can do math with the data, it’s quantitative/numerical.
I did not say, “if the data is a number, it’s quantitative/numerical”.
Because there are examples of data that happen to be numbers, but you’d never do math with them. Here are some examples:
- Special labels (like zip codes, which refer to a place on a map, and happen to be numbers, but dom’t make sense to say, “Hey, can you ad zip code 33617 and zip code 73505?”)
- Rankings (numbers used to label something like who came in 1st, 2nd, or 3rd in a race. It doesn’t make sense to say “What’s 1st place divided by 3rd place?”, for example)
- Sequences (numbers used to tell which order things are in, like assigning numbers to steps in a process, or assigning numbers to spots where someone should stand in line.)
- Identifiers (numbers that are used once and only once to label something or someone…maybe something like a driver’s license number.)
- We’ve looked at data in simple terms, and defined it as the words or numbers that we notice or measure about the world around us and just so happen to bother to save, either on paper, or digitally in a computer (preferably the latter, because then we can do cool things later on with it much more easily).
- There are two types of data: Quantitative/Numerical and Qualitative/Categorical
- Deciding what kind a particular piece of data is doesn’t have to be hard. The general rule of thumb is: If you can do math with the data, it’s quantitative/numerical. Otherwise it’s qualitative/categorical.