Thursday, May 19, 2016

Data Me This



Nate Silver over at fighthirtyeight.com has written a sort of mea culpa about Donald Trump—“How I Acted Like A Pundit And Screwed Up OnDonald Trump.”

I’ve always been lukewarm on Nate Silver. On the one hand, I think his polls-driven focus and almost complete co-opting of the concept of “Data Journalism” is ultimately a good thing; however, I also think that he’s historically gotten a lot more credit than he deserves, as if he were the only person on the planet who predicted that Obama would win in 2008. He’s good, but I never thought he was the Golden Child of Polls that he was made out to be. Still, credit where credit is due.

Anyway, I won’t go into the specifics of the article itself, and talk more abstractly about the rise of data journalism. I think the concept is a little strange—the fact that the media hasn’t been using data extensively in the past is a little alarming, but I get their point. And numbers are almost always a good thing; as we’ve seen with such things as sabermetrics, numbers in even the most nebulous of activities tend to tell a better story than a sports writer’s gut. (Notably, Silver is heavy into sports and data as well.)

I would posit that things are different in politics, and why data journalism is always going to be a good but never perfect fit for politics. I think data journalism makes things better, but we can’t pretend it’s ever going to tell us the whole story.

There’s a few reasons for this, and these reasons more or less show up in the article Silver writes above. The biggest, in my mind, is that elections are rare and so the sample size is too small to derive much by way of information from them--and since the political landscape changes, even slowly, you're never, ever going to have a big enough sample. There have only been 11 presidential elections since 1972, when the modern primary system was developed, and even if we double it for both parties and ignore non-contested primaries, that’s not a whole lot to go off of.  Contrast against, say, baseball, where hundreds, if not thousands, of games and stats are available for perusal at any give time per year. 

Add to this the more obvious fact that things change over time, and information that was pertinent in 1972 is largely useless in 2016. If we only go off of the previous, say, 12 years (a rather practical time frame to map out the current political landscape) we’re still looking at, at best, 6 events for the sample size. This is a little bit more fuzzy, since a lot of things are “standard” (it wasn’t unreasonable to assume that a candidate not supported by the party will lose is a constant) and other things are not…but now we’re getting away from data journalism and into just plain old journalism.

It’s these things that make me believe there’s always going to be an element of question marks in data journalism as it pertains to politics. This current election is a prime example: Trump has thrown so many monkey wrenches into the narrative you'd almost think he was the Mule from Issac Asimov's Foundation series.

I still think it’s a useful tool, but there are too many singular variables and too many one-off incidents to make it all that accurate. And, to Silver's credit, there is a lot of new stuff that can be mined from a data-driven perspective. We may get better at it as to reach perfection, but it seems highly unlikely.

No comments:

Post a Comment