Does the Data Really Know?


When historians look back on our era, I suspect they will be fascinated by the amount of time and effort we spent searching for magically decisive data—data that removes the risk and guesswork from life, data that provides authoritative answers about where to go and what to do next, data that does the thinking for us or at least absolves us of responsibility for it (call it “CYAD.”)

We rarely notice the grammatical magic, the linguistic sleight-of-hand (some would just call it an error) that helps us imagine such godlike data might actually exist.

The grammatical magic I have in mind involves the surreptitious swapping of a singular form for a plural one: “Data,” after all, is technically the plural of “datum,” even though we’ve mostly come to accept its usage in the singular (as I used it in all of the sentences above). Seem like a minor grammatical matter? Maybe. But I suspect this little linguistic point covers over a much larger hole in our collective thinking.

After all, when we talk about “unleashing the power of your data” or “moving to data-driven decision making” or “preparing for the era of Big Data,” we don’t seem to be conceiving of “data” simply as the plural of datum. We seem to have something more in mind than a collection of discreet bits of information, each of which is potentially unreliable.

We seem to have in mind a miraculous sort of data that speaks in a clear, authoritative, singular voice even as it perfectly represents a plurality. In and through “data,” we seem to want to insist—grammatically if not quite consciously—that the manifold (must? will? can?) present itself as one.

Not sure what I mean? Try substituting “datums” or “data points” for “data” in any of the phrases above and see how different the plural-as-plural feels. (“Preparing for the era of big collections of data points, many of which may be confusing, contradictory, or wrong” just doesn’t sound nearly as sexy.)

By referring to the plural and multiform data as if they were all one, we hide from ourselves the inherent messiness, unpredictability, and potential for chaos that lurks within a world made up of unique particulars. We convince ourselves—grammatically, if not quite consciously—that underneath our present confusion, the world itself is surely categorical, consistent, predictable, and law-abiding.

We reassure ourselves that we can eventually, ultimately, finally track it all accurately, digitize it without corrupting it, and reduce it to elegant formulas—algorithms that at least feel definitive, even if most of us don’t really understand them. We tell ourselves that we will soon, finally, know. We just have to ask the all-powerful, God-like data.

We ignore the fact that, like most things we encounter in reality, actual data are typically not singular but plural, not clear but confusing, not inherently consistent but frequently contradictory. Sometimes they’re even outright wrong—”garbage in” that will generate “garbage out,” no matter which way you slice it.

We place our faith in the notion that Big Data will protect and provide. But the truth is simpler and less dramatic: looking more carefully and creatively at more of the facts can certainly help us find better answers. But the magic isn’t in the data per se. It’s in the care we take to gather them, analyze them, and represent them well.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s