# What is Data? (Like I'm 5)

“Daddy, what is your job? What do you *do* there??”, asks my 5 year old in a curious tone as he tries to figure out why I went to *work* today instead of staying home to play with *him*.

“Weeeell…” (not a confident start to my response, I know…).

“Iiiiii…help people with *data*.” 😬

“*Data*? What’s *data*??”

[Think, Andrew! Think! … Iiit’s the… recorded facts (numbers and such) that we use to interperet reality… no… sratch that. It’s the observations we made about a process or event that we bothered to write in our notebook or store in our computer… man, this is

hard…]

Ugh. 😞

“Hey! Wanna go ride bikes??”

Yeah, I know. Lame.

But it got me thinking. How *could* I have explained data to my 5 year old?

Better yet, could that explanation be valuable to someone *older* than a 5 year old?

## Data

Data. Where do I start?

Let’s take a walk…

Imagine with me that we’re on a bug hunt (c’mon, you’re 5, remember?). We’ve just unearthed a spectacular rock. What’s underneath? *A mother lode of bugs*, of course!

“Quick!”, I say, “Let’s get out our handy dandy *notebook*!” (as I channel Blues Clues).

In the notebook, we write down a bunch of things:
* The **names** of each kind of bug we saw
* The **colors** of each one
* **How fast** (slow, medium, speedy) each one moved
* **How many** we counted

We snag a few bugs to take back to “the lab” (the dining room, of course) to inspect more deelpy.

On our “lab bench” (the dining room table, of course), we pick a beetle, and try to hold him down so he can’t move. That way we can take a few more measurements for our notebook:
* **How heavy** it is
* **Number of** legs
* **How long** the beetle is
* **How wide** it is

We bust out the magnifying glass and observe a bit more:
* It has *wings* (physical trait)
* It has *antennas* (another physical trait)

“All right, beetle back in the bug box! Next specimen!”

This one’s a rolly polly. Same routine:
* **How heavy**
* **Number of** legs (asking a rolly polly to be still so we can count is unrealistic, so we took a picture and counted what we saw in the photo)
* **How long**
* **How wide**

Again with the magnifying glass: * No wings on this one * Has antennas though!

Here’s the point of this riveting adventure: Everything we wrote down in the notebook? **Data**.

We looked at something we saw in the world and jotted down what we noticed.

We put in to *words* or measured in *numbers* what we noticed about…bugs, in this case.

Names (kinds, species), color, slow/medium/fast. Has wings (or not), Has antennas (or not).

How many bugs total, how heavy, how many legs on each, how long, how wide.

Each word and each number that we bothered to write down is **data**.

Data is *everywhere*…at least, the *potential* for there to be data is everywhere.

I suppose it’s not actually data until it’s “captured” and saved so that we can look at it again later.

Anytime something happens in the world and we succeed at saving a *description* that event, **what we saved is data**.

I think we’re making some progress!

## Two Kinds of Data

I wonder if you noticed the way I grouped a few of the observations we made about the bugs.

Sometimes we used *words* to describe what we saw: “Black”, “Red”, “Has Antennas”, “Slow”.

Other times we used *numbers*: “1” (gram), “29.2” (millimeters), “100” (legs)

Let’s use some words split up the two kinds: * Quantitative (Numerical) * Qualitative (Categorical)

### Quantitative (Numerical)

Quantitative/numerical data describe things we see about the world in terms of **numbers**.

“Numerical” and “number” go together pretty well. It’s that “Q” word, “quantitative”, that gives people trouble sometimes.

If you listen for the word “**quant**ity” in the word “**quant**itative”, that could give you the hint for putting the term together with numbers.

In general, quantitative/numerical data describes “how much-ness” or “how many-ness” of something we notice about the world.

If you can measure it, count it, weigh it… if you can **add up** the things you wrote down, or **do other “math-y” things** to the data, then you’ve got quantitative/numerical data.

### Qualitative (Categorical)

Qualitative and categorical mean the same thing, but it might not be clear *what* they mean right away.

One way to look at it is that it’s *anything that’s not quantitative/numerical*, since there are only two overall kinds of data. That’s not so helpful for understanding the term though.

Let’s listen to the words again…

Do you hear “**qual**ity” in “**qual**itative” or “**categor**y” in “**categor**ical”“?

When we describe the qualities of something, what we’re noticing are its features. The bugs can be red or black or gray. They can be fast or slow. They can have wings (or not)… have antennas (or not).

It’s as if you can split them up into groups, or *categories*, based on how similar they are to one another…. winged bugs in this box… black winged ones in this compartment, and red winged ones in this other. Slow black winged bugs in this sub-compartment; fast black winged bugs in this other. It’s easy to group qualitative/categorical data like this.

That’s one way to think about what makes qualitative/categorical data qualitative (or categorical).

Another way involves math:
We all know what to do with 1 + 1, right? We **add it up**, and get **2** as the answer.

Here’s one for you though: What about “black” + “red”? Now, I mean… if you said “dark red”, I can see what you did there, but we typically don’t think about doing math with words, do we?

“Rolly Polly” + “Beetle” doesn’t make much sense, does it…

“Has antennas” divided by “slow” equals …well… nonsense, right?

### Rule of Thumb: Quantitative? or Qualitative? … Numerical? or Categorical?

So here’s the rule of thumb: If you can do math with the data, it’s **quant**itative/numerical, rather than **qual**itative/categorical.

*Word*-y data is the kind of data that ends up being **quali**tative/categorical in nature.

### Numbers Can Be Qualitative/categorical

Notice what I said in that “rule of thumb” statement: If you can **do math** with the data, it’s quantitative/numerical.

I did *not* say, “if the data is a number, it’s quantitative/numerical”.

Why?

Because there are examples of data that happen to be numbers, but you’d never do math with them. Here are some examples:

- Special labels (like zip codes, which refer to a place on a map, and happen to be numbers, but dom’t make sense to say, “Hey, can you ad zip code 33617 and zip code 73505?”)
- Rankings (numbers used to label something like who came in 1st, 2nd, or 3rd in a race. It doesn’t make sense to say “What’s 1st place divided by 3rd place?”, for example)
- Sequences (numbers used to tell which order things are in, like assigning numbers to steps in a process, or assigning numbers to spots where someone should stand in line.)
- Identifiers (numbers that are used once and only once to label something or someone…maybe something like a driver’s license number.)

## In Review

- We’ve looked at data in simple terms, and defined it as the
*words*or*numbers*that we notice or measure about the world around us and just so happen to bother to save, either on paper, or digitally in a computer (preferably the latter, because then we can do cool things later on with it*much*more easily). - There are two types of data: Quantitative/Numerical and Qualitative/Categorical
- Deciding what kind a particular piece of data is doesn’t have to be hard. The general rule of thumb is: If you can do math with the data, it’s quantitative/numerical. Otherwise it’s qualitative/categorical.