Seven Concepts a Dept. of Defense Program Manager Needs to Successfully Develop Social Science Models: Part 1

pdf_iconThere you go again. Ronald Reagan [1]

It was with a mix of deja vu, amusement and resignation that I saw the latest Dept. of Defense (DoD) pronouncements—try here and here —about their intentions to take a very important innovation in machine learning, recurrent neural networks [2], and use this as the centerpiece of a major new machine-human interaction initiative.  

It’s that word “human” that’s setting me off, as when it comes to technical applications, DoD can’t ever seem to do “human. Wow, creative new initiative but…been there, done that, and I’ve seen so many similar things come and go over the years—decades in fact—always with the same result [3]: a big heap of money spent that might as well have been stuffed into [Chinese] fireworks and sent skyward on the Fourth of July.

It’s probably getting worse for me now that I’m just a couple hours south of the Beltway and can attend meetings on short notice. There’s a pretty consistent script: You start with something ambitious though awkwardly defined—the sort of thing that in academia I would have sent back to a grad student for a re-write—but generally plausible. You’ve got a bunch of people in a room [4], and some of these are absolutely top in their field and are sincerely trying to be helpful and want to get to a feasible project definition, figure out some appropriate technology, and move everything forward. For an hour or two, things are going pretty well.

But invariably, after a promising start, we head down a very predictable rabbit hole and end up—yet again—at the Mad Hatter’s Tea Party. And typically stay there. Curiously, in my experience, this is endemic just to DoD, making it all that more puzzling.

Or not, since the proximate cause of the descent into madness can always be traced to inevitable presence of a cluster of pallid, over-weight men (they’re always white men) of late-middle age—the Pillsbury Doughboy look—representing all of the usual suspects of the permanent civilian defense contractor class. Whenever things start looking promising, these dudes start asking the stupidest of questions, exhibiting unbounded cluelessness concerning the topic at hand, and going into long discourses on the impossibility of doing the sorts of research that other parts of the US government have been doing with great success for decades [5], often on precisely the same topic under discussion, and the likes of Amazon, Google and Facebook have as a gadzillion-dollar business model.

So, methinks, what gives? Are these guys really that stupid and sent by their bosses to get them out of the building? If we were ISIS, of course, these folks would be placed at the top of the roster list for suicide missions, but we’re not ISIS. So why are they here?

With the intensified exposure to this phenomenon in the past couple of years, I’ve finally figured it out: the Doughboys are the equivalent of the Communist era minders in Soviet puppet states, and, consistent with the tactics of the Old Left, their entire purpose is to make sure that these meetings remain completely pointless [6] and avoid the disastrous possibility that DoD might, say, spend $10-million on some social science [7] research that would prevent a $100-million mistake or even worse, spend $100-million on research that would prevent one or more $1-trillion mistakes, or, worst of all, develop a sophisticated social science research culture within DoD comparable to that found in numerous other parts of the government, to say nothing of academia and the private sector. No, discretionary DoD money needs to kept where it has been all along, funding mind-boggling levels of contractor fraud, weapons systems that don’t work , and 12-figure cost over-runs.

Well, gotta give the Doughboys credit, they’ve done one heck of a good job for their corporate masters! But that doesn’t mean we have to like it.

So in the spirit of the Yule season and as a public service, MouseCorp—which, full disclosure, is not entirely uninterested in having DoD learn how to do research appropriate to the 21st century—will provide guidelines to a small number of critical concepts which individuals trying to manage these new programs might, just might, be able to use to shut up these parasitic bastards [8]. There are more than seven—though the list is still fairly small—so for convenience this will be done in two segments, the first focusing on fairly specific technical concepts, the second, in a week or so, on some more general literatures.

For the sake of exposition, let’s assume somehow one or more of these new proposed projects makes it through the preliminary efforts to kill it, and you are managing it, and you quickly figure out that you could get a big boost if you’d incorporate some state-of-the-art social science modeling methods into the project. You’ve got the Doughboys with their corporate overlords and Gucci-clad lobbyists hamstringing you at every step, trying to make sure the project fails, but you’ve got some of that anachronistic Greco-Roman Stoic civic virtue thing going, and you’d really like the project to succeed. And since this is DoD, you’ve got a budget at least an order of magnitude greater than what the National Science Foundation or “the part of the national security community that shall not be named” would have available—granted, fully half of the funding will go to program reviews and PowerPoint slides [9]—so resources are not the issue. Deciding on workable approaches is the issue. So here are some key concepts, with more to follow:

1. The Forecaster’s Quartet

Become familiar with the following:

  • Daniel Kahneman. Thinking Fast and Slow: 30 years of research which won a Nobel Prize and is a great read, long residing on the business best-seller list [10]
  • Nassem Nicholas Taleb. The Black Swan.
  • Philip Tetlock. Expert Political Judgment [11]
  • Nate Silver. The Signal and the Noise: popular-level antidote to the contention that human behavior is not predictable

And finally at the article length and a more challenging technical level, but in terms of political prediction using formal models, easily the most important work in the past quarter century:

Michael D. Ward, Brian D. Greenhill,  Kristin M. Bakke. The perils of policy by p-value: Predicting civil conflicts. Journal of Peace Research 47(4) 363–375 [12]

Like most such paradigm-smashing contributions, they had a very difficult time getting it published.

2. Model specification and the centrality of theory

This comes first in the list of technical terms, since if you don’t get the model specified right, everything else is doomed, and the best weapon in your arsenal here is a thorough review of existing theory. The Doughboys hate that—their motto is “A week in the lab saves an hour in the library” since that attitude keeps the meter ticking and the money flowing. DoD projects [13] tend to approach every problem like they were the very first people to ever think about it, whereas in more cases than you’d expect, someone was thinking about it 2,500 years ago and helpfully wrote those ideas down. If not 2,500 years ago, then almost certainly in the past 50 years. A lot of it is garbage but you will discover that the slop dished out by people utterly unfamiliar with existing theory generally isn’t very helpful either.

A good theory tells you which haystack you need to look in to find the needle. Once you’ve got that, you don’t want to just pile on more hay. Conversely, specify the model incorrectly, and you’re just doing the wrong thing better. [14]

3. Latent dimensions and colinearity

A “latent dimension” is the technical term for what would commonly be called a “generalization,” and in statistical terms involves a set of indicators which co-vary. “Economic development” is the standard—and appropriate—example: we know from experience that advanced industrialized economies differ in a large number of ways from developing economies, and while GDP/capita is the most common way of measuring these, plenty of others would work just as well. Famously in the conflict forecasting realm, infant mortality rate.  “Democracy”, “quality of governance,” “globalized economy” and “political instability” are other common relevant latent dimensions in the conflict forecasting literature.

The key point about latent dimensions is once you’ve got a couple of measures for a dimension—or even a single really reliable measure—adding more variables gives you very little information. In fact, in the linear models commonly used in statistical studies—discussed in the next entry—this becomes counter-productive because of a problem called co-linearity, which plays havoc with the variance of your coefficient estimates.

Latent dimensions are also the reason Achen’s “Rule of Three”—discussed next time—is so successful. The Doughboys hate this: their interest is in piling on redundant indicators to drive up the costs and delay the project, and hairball models are a solution, not a problem. Resist.

4. “Error” is part of the process

For starters, in social science models, it’s not really “error” in the same sense that we think of “error” in tightly controlled physical processes. You are better off thinking about “error” as “things not included in the model” because we’re dealing with open complex processes, not the engine cylinder clearances on a BMW 760. They’ve not been included because the information is not reliably available, or is not cost-effective [15], or the indicators will actually introduce more error than they reduce, or the process is intrinsically random. [16]

So, it’s not “error”, it’s “everything else”. But keeping with convention, we’ll keep calling it “error.”

5. Accuracy, Sensitivity, Precision and ROC curves

“You can’t manage what you can’t measure” right?  In the bad-old-days, social scientists tended to measure errors using a single linear correlation-based measure called R-squared but contemporary models for predicting whether or not something will happen are generally evaluated on the aforementioned series of measures that the machine-learning folks have been using for quite some time. Look’em up: Wikipedia has vast resources here. The “ROC curve”—and the most common statistic based on it, the AUC [area-under-curve: Google it]—is particularly challenging to grok because it has an invisible dimension—the change in the threshold at which the model predicts the event will happen—but this is now nearly universally used, so put in the effort.

Most prediction problems relevant to national security concern “rare events” and these present a number of measurement challenges, not all of them resolved. Again, get up to speed on this and know the “gotchas” E.g. it is trivial to generate very high accuracy on any rare events problem without producing anything useful for policy purposes. [17]

6. Estimation

All non-trivial models have coefficients [18] and these must be estimated from the data. All estimates have—that word again—errors, or more accurately, variation, and this can also be estimated, though the veracity of that estimate is dependent on the extent to which the characteristics of the data—nowadays the term is usually “data generating process”—corresponds to the assumptions used to derive the estimation method, and as anyone with any experience in the field knows, the Data Fairy is frequently not very kind. All estimation methods can exhibit pathological behaviors when confronted with sufficiently weird data but fortunately for the widely used modeling methods—discussed in the next entry—these are extensively studied and understood.[19] New methods?: you’ll be the test case, and these may go very badly. [20]

Historically, social science statistical work was done “in-sample”, where the data used to estimate the model and the data used to test it were the same. This leads to “over-fitting” or “fitting the error” and often did not generalize. Contemporary work—and virtually all machine learning work—uses any of a number of more robust  “split sample” designs where the “training” and “test” data are separate.

7. Significance testing versus Bayesian approaches

Historically, virtually all social science statistics were done using the “null hypothesis significance testing” approach [21], which is both highly counter-intuitive and very often misinterpreted even by people who should know better, but was the only practical method prior to the availability of large amounts of computing power and some innovations in estimation methods that only occurred a couple decades ago. Significance testing is gradually being replaced by Bayesian methods [22], which are more likely to provide the information you are actually looking for, and in principle should integrate well with qualitative approach: the buzzword here is “informed priors.” The important thing: understand what each approach does and does not do.


Enough for now, and I’m a bit over the word limit already. There’s more to come, and I’ve already assigned too much homework. Still, just with this material alone, you can start the push-back and keep your project from going off the rails: just look those Doughboys straight in the eye and growl “So, feeling lucky, punk?” [23]

Beyond the Snark

Not a lot this time: Wikipedia is generally pretty good on the subject of statistics—well, it is amazingly comprehensive and technically accurate; sometimes the exposition leaves a bit to be desired. Quite a number of people who teach methodology, where a huge amount of labor goes into producing a good set of lecture notes, have posted these on the Web. As has MIT. []  I’ve generally used standard terminology here, though in a few areas things get a little confusing because the statistics and machine-learning communities, having developed more or less independently (this is actually quite remarkable but hey, academic silos are built to withstand a lot of pressure) not infrequently use different terms for the same concepts. But in general with all of these terms “Google it” will get you lots of information.

My original and somewhat technical exposition on what is wrong with most of the existing approaches: By the way, a few people have interpreted this article as my rejecting quantitative approaches. Far from it, starting with the fact that doing quantitative work is how I keep food on the table. I’m merely saying if you are going to use these increasingly effective methods, do it correctly.

GSA excess: And the practical consequences:

Defense contractor-sponsored equivalents to GSA: funny, you don’t see any stories about those events. An absence of curiousity which I’m sure is entirely unconnected with the presumably rather costly defense contractor advertisements that keep popping up on my—perhaps not your—Web versions of the New York Times and Washington Post. “Yes, an F-35…why, just in time for the holidays, a perfect gift for my nephews! I’ll take three, my good man, and can you have them wrapped and delivered? Jolly good of you to remind me with that expensive animated advertisement!”

How Rome became with the hegemonic successor to the Hellenistic empires: haven’t read it but Mary’s Beard new popular history, SPQR [] has gotten a lot of good reviews. There’s more to the story than gladiators. Really.  And with a level of wealth inequality approaching that of imperial Rome, there’s stuff we can learn.


1. With Reagan now subjected to vicious character assassination by the Fox News crowd, I’m going to open the next few entries with Reagan quotes. Judge people by their enemies, eh? Besides, in the current environment, he’d be considered a bit of a lefty, what with all that arms control and raising taxes.

2. I’m not providing many links this time: With every technical phrase in this entry, if I can Google it, you can Google it.

3. And whenever I’m told this, people outside of DoD—never from the inside—suggest I’m cynical because all of the highly successful projects are secret. Well, if they are I’m one important guy, because a huge pile of money has been spent on unclassified foolishness over the years just to distract me from learning about the good stuff. Really, I don’t think I’m that important.

4. In the old days, these came with a nice spread of donuts and sandwiches—typically we’re doing these things for little or no compensation beyond expenses. But with the combination of the Tea Party trash-and-burn budgetary tactics and that wonderful tax-payer-funded Las Vegas bacchanalia of the GSA, those days are gone. Launder that same tax money through a defense contractor, of course, and the bacchanalia are still absolutely fine: been to a few of those as well, and other than the experience of seeing the reckless extravagance making me want to throw up, they are splendid exercises, and most of the attendees, staggering about under the burden of unlimited quantities of cheap booze, consider them a professional entitlement. Though I wouldn’t be surprised if those things have pushed more than a few folks—at least the caterers—into the Tea Party, albeit with no effect.

Get yourself to one of these defense industry affairs and it’s night after night of lobster, champagne and live entertainment in lavishly decorated resort hotel ballrooms, all provided by—smirk, smirk, wink, wink—”company sponsorship.” Going to a government-funded conference on finding a cure for Alzheimer’s, or teaching kids to read, or preventing rusting bridges from collapsing?: it’s gonna be Subway and a diet cola, maybe followed by a pitcher of Miller Lite shared with your buddies at an anonymous sports bar in a strip mall, and remember to bring your wallet, because all this comes out of your own pocket.

Some dumb hick from Abilene, Kansas pointed out the problems with this system back in 1961. Lotta good that did.

5. Seriously, do you think the Federal Reserve Board and the Centers for Disease Control spend their days listening to a gaggle of demented bozos braying about how human behavior is unpredictable?

6. There’s a little ritual in these meetings where the program manager earnestly intones a little mantra—I’m not sure whether the original was in Latin or Sanskrit—about their deep responsibility of not wasting public money. Given they know the Doughboys are in the room precisely to insure that public money is wasted, it must be hard to do this with a straight face. Though I suppose that why program managers get paid the big bucks. Joke.

7. For the Doughboys, “social science” is an oxymoron, an observation they will share endlessly. In fact they probably have ” ‘social science’ is an oxymoron” tattooed somewhere on their bodies, probably somewhere I’d certainly prefer not to look.

8. Technical term of art: actually, one of my several working titles for this essay was “Seven Under-stated Reflections on When the Hell are You People Going to Learn How to Keep Those Parasitic Bastards from Making Off with My Tax Dollars?” But that doesn’t scan particularly well.

9. When the history of the decline of United States hegemony is written in a century or two, the role of the Doughboys will probably deserve at least a footnote. Though the role of the PowerPoint virus—no, not a virus carried by PowerPoint; PowerPoint is a virus—will get a chapter.

That history may be written in English—in New Delhi—rather than Chinese, just as the successor to the Hellenistic empires was not the obvious Mediterranean candidate, the wealthy and commercially savvy Phoenicians from their base in Carthage—but instead a theretofore marginal group of Italians living along the Tiber. The second mouse, as it were.

10. I will assume you can find these on Amazon or, if you prefer not to support a soul-destroying mega-corporation whose business model involves removing every last visage of humanity from the workplace, a local bookshop if you have one. Probably run either by some balding old guy who is likely to engage you in an extended conversation when you really just wanted to buy a book, or someone with cats. Though I’m also actually becoming rather fond of the Barnes and Noble chain, particularly when after seeing the grey hair, they let this old guy to the front of the line for a seat at an overflowing author’s event, and at Union Square in New York City no less! I digress…damn old guys who don’t know when to stop…well, “communicating”, if that’s even the relevant concept… 

11. We’ll deal with his more recent work on superforecasters in the next entry.

12. General link is Since I still have an adjunct academic appointment with library access, the version I’m seeing isn’t paywalled; your results may differ. If you dig a bit,you can usually located non-firewalled versions of widely-cited academic papers, and this would qualify. Or you can pay: none of that payment goes to the authors, of course, as academic publishing doesn’t work that way, instead it is a monk-like humble offering to the cause of restricting the flow of knowledge generated through public funding and to further increasing the level of inequality through the support of a tiny oligopoly of rapaciously profitable publishers. Yet again, I digress.

13. And GOP presidential candidates…

14. For some fairly technical reasons, “specification error” in some of the most commonly used models is even worse, since if a variable you’ve incorrectly put in the model is correlated with variables which are actually causal, the variable will appear to have a stronger effect than it actually has. Though if you are only interested in prediction, this isn’t that big a deal. Still, a model that is consistent with a correct theory will almost certainly have better properties than a model that isn’t. Which is also why the “data will replace theory” arguments are overly optimistic: a good theory is vastly more useful than any undifferentiated mess of data.

15. Hey, give us poor proles who still actually have to pay taxes a break here, will you?: even in DoD research there should be a concept of “too expensive.”

16. Or appears to be, as in chaotic processes such as weather. I’ll say a bit about chaotic processes in the next entry; for now suffice it to say these are not the semi-mystical phenomenon some would have you believe, just an unexpected but completely deterministic aspect of a very simple dynamic equation you can easily experiment with in one column of a spreadsheet.  Intrinsic randomness could be an essay in its own right…maybe later…

17. But caution, particularly if you’ve only skimmed Taleb: rare events are not the same thing as black swans.

  • Black swan: an event that has a low probability even conditional on other variables
  • Rare event: an event that occurs infrequently, but conditional on an appropriate set of variables, does not have a low probability

Using a medical analogy, certain rare forms of cancer appear to be highly correlated with specific rare genetic mutations. Conditioned on those mutations, they are not black swans.

Taleb definitely gets these distinctions, but many of the popularizations of Taleb (who in turn is quite consciously—he profusely acknowledges their work—a popularizer of Kahneman and his collaborators, and Tetlock) miss it.

Also worth noting here—since I ran out of my seven allocated categories—are events which are too predictable: these are called “auto-regressive” or “auto-correlated,” which simply means that the value of a variable at time t is highly correlated with the value at t-1. Most human activities have this characteristic—humans, and particularly human institutions, are fairly boring and predictable, except when they aren’t. The sorts of sequences one tends to look at in political conflict forecasting are highly autocorrelated except for a small number of highly consequential exceptions which are…rare events. From the perspective of a methodologist, it makes the whole problem rather interesting.

And in a final, really techy, aside, there’s a tendency to confuse autocorrelated variables and autocorrelated errors. The presence of the latter considerably complicates estimation, all the more so when you get both at once. Errors are autocorrelated for the same reason variables are autocorrelated: human behaviors tend not to change much over time, and “errors” are just the factors not included in the model, and quite a few of those involve humans.

18. Google “coefficient” if you aren’t familiar with the term: I can’t begin to count the number of meetings I’ve sat through where we appeared to be operating under an assumption that functioning models would be delivered by, well, maybe Elminster Aumar, High Wizard of the Forgotten Realms?…I dunno, sure the heck wasn’t going to be through any systematic estimation method worthy of discussion.

19. Newer machine learning methods also have a “hyperparameter” issue—the estimators can be configured in a wide variety of different ways, some better than others, and optimizing these using vast amount of machine cycles is another important new research field. Older methods were derived algebraically and generally had a very small number of free parameters.

20. It is remarkably difficult to find new methods that consistently outperform the “obnoxiously effective” old standbys I will discuss in the next entry—conventional and logistic regression, support vector machines, conventional neural networks, and clustering methods. That’s why they are old standbys. That’s also why a genuinely effective new entrant like recurrent neural networks is such a big deal.

21. Usually referred to an “frequentism”, particularly by its detractors. Its supporters call it “statistics.” In some circumstances, frequentism is completely appropriate. But it isn’t universally appropriate and until about 20 years ago, it was treated as such.

22. Look at the research interests of the faculty in almost any university statistics department and you’ll find that most of the younger people are working on Bayesian methods: this is not an instance of random selection.

23. Like so many memorable quotes, that’s not the actual line, which had a rather rambling preamble before finally getting around to:  “You’ve gotta ask yourself one question: “Do I feel lucky?” Well, do ya,punk?” Rather as the oft-quoted “Play it again Sam” condensed seven lines of dialog  none containing the word “again.” For simplicity, and cognizant of the date, 17-Dec-15, stick with “Han shot first.”

Posted in Methodology, Politics | 1 Comment

Seven Updated Observations on Trump

pdf_iconIt is now exactly five months since I posted Is Trump pulling a Colbert on the Republican Party? ( and for some reason, presumably quite unrelated to that timing, that entry has experienced an upsurge in views over the past couple of days. So, perhaps it is time to update.

Like pretty much everyone, I’d expected Trump—irrespective of whether he was pulling a Colbert/Snape—to be political history by this point, and given that he is not, as a good Bayesian we need to recalibrate.  And thus we will:

1. Trump’s supporters are a genuine political movement with a coherent set of grievances against the GOP [1]

Trump’s base, it is now clear, comes from socially conservative less-educated whites who forty years ago heeded the siren call of Richard Nixon to abandon the Democratic Party with its new multi-ethnic agenda and cast their lot with the GOP.

So how has this worked out for them?

  • The GOP has delivered on none of their social agenda: prayer in schools is still outlawed [2]; gun access has probably been expanded a bit, though this can largely be attributed to the NRA; at the Federal level nothing has changed on abortion (it has been somewhat restricted at the state level), and they’ve suffered an epically stunning reversal on gay rights.
  • Economically they have been at either a standstill or, more realistically, gone backwards, with various elements of globalization accounting for much of this, both in closed factories and competition with younger immigrants for low-wage jobs.
  • Their life expectancy is declining, their neighborhoods are wracked by drug abuse, suicide and divorce; their Main Streets are a mix of shuttered storefronts, pay-day loan operations and consignment shops, and the few remaining viable businesses are all controlled by distant corporations.

Not a pretty picture.

The initial response to this situation were the Tea Party movements beginning with the 2010 electoral cycle, which delivered first the House, and four years later the Senate, into Republican hands. And with this newly mobilized power Republicans—faced with a President who the 24/7 bloviator media circus assured them was the most depraved politician since Caligula, having already secretly reduced the US to the level of Libya and on course to turn it into North Korea—in the face of this profound existential threat the Republican establishment did…well, basically nothing, because in fact the Republican [and Democratic] elite are perfectly content with the status quo and have no real incentives to change anything.[3]

The distinction between now and 2014, I would suggest, is that this group—who may be largely powerless, but are certainly not stupid—realized that the core strategic error of the Tea Party was a naive faith that they could gain power in some sort of bottom-up libertarian self-organization with elements of Bukunin’s collectivist anarchism, but without central coordination. That failed. The obvious alternative is to seek a central leader, and into that political vacuum, perhaps not even fully realizing what he was getting himself into, walks Trump.

2. Yes, Trump is a fascist but…

Ross Douthat pretty much has nailed this one: : not really anything else to add here.

3. He has no street power, nor will he ever

That is, we’re not going to see true European-style fascist movement emerge here, as those require extra-legal power in the form of urban militias. That is an incredibly high bar for Trump to cross for at least the following reasons:

  • The bulk of his support is older and rural, not young and urban. While the U.S. has long experience with right-wing rural rebellions, starting with Shays’ Rebellion, the dominant approach has been to pretty much ignore these, with the Nevadan deadbeat Cliven Bundy being the most recent example, and alternative approaches such as those used against the Branch Davidians and Ruby Ridge did not end happily. That’s one of the advantages of living in a really big country with lots of empty space.
  • Even if Trump could somehow attract a younger crowd, they aren’t sufficiently fit for regular combat, and it is hard to imagine they would be fit for street combat. A positive side to the High Fructose Corn Syrup epidemic, I suppose. [4]
  • Unlike Europe, the U.S. has no culture of soccer hooliganism, and soccer hooligans are the shock troops of modern urban street violence. Ask Egypt’s Hosni Mubarak. For this we probably can thank the university-based structure of U.S. sport in contrast to the club-based structure found in the remainder of the world. So the NCAA is good for something. [5]
  • And finally, U.S. police forces are pretty well equipped for and experienced at dealing with mass urban violence. For better or worse.

But really, this is not going to get to the point where U.S. riot control tactics are deployed against masked Trump supporters: I see no credible path for Trump to mobilize significant mass violent street support, thus restricting him to the ballot box.

4. Trump is not Mussolini

This meme was circulating here in Charlottesville a couple weeks back, and as a consequence I read a whole lot more about Mussolini than otherwise I’d be inclined to do, and except for some imperfect convergence in ideology, Trump and Mussolini have absolutely nothing in common. Nor, except for ideology, do Trump and Hitler. Yet today we see no less than the usually sane Dana Milbank making the same comparison in the Washington Post.  It ain’t so.

Trump and the Italian billionaire politician Silvio Berlusconi have a fair amount in common, and not just their attitudes towards women. But the only thing Berlusconi and Mussolini have in common is they are Italian and their names end with the letters “ni”. Apparently sufficient for an analogy these days.

5. It’s not just the polling

We are witnessing a great deal of strum und drang over the weaknesses of contemporary polling methods, plus the usual caveats that polls distant in time from the actual elections have little predictive power. But the persistence and size of the Trump numbers, which are also supported by the fact people show up at his rallies and tell a fairly consistent story seems well beyond what one would expect of measurement error. I’m not a pollster but I know what randomness in a time series looks like, and that’s not what we are seeing.

6. The choices for the Republican nominee are probably down to Trump, Cruz and Rubio, and Cruz is very well positioned here.

As I’m sure has occurred more than once to Cruz, in the absence of Trump, Cruz can make the best argument for being an establishment outsider—by all accounts he is completely loathed by his fellow Republicans—as well as having good Tea Party cred and, unlike Trump, understanding evangelicals. So if Trump somehow crashes, Cruz is the clear beneficiary, and doesn’t really need to work on this, though he is doing so anyway.

I’m practicing U.S. politics prediction without a license here, but with the complete meltdown of the Jeb Bush campaign it seems like Rubio is the only serious establishment player. But how long it takes to get to that point, and whether we go to the GOP convention in Cleveland with three major candidates (can both Trump and Cruz remain viable?) or two remains open until we start seeing primary votes. Polling on second-choices would be useful here—if Trump is completely out [6], do voters go to Cruz or back to the establishment choice, presumably Rubio?—but given the difficulties in the first-choice polling, that would be hard information to get.

7. A Trump-led third party could potentially be more than a spoiler

In the absence of the very real possibility of the Republican nomination, I’d put the probability of a serious—probably at a Ross Perot level, certainly more than a Ralph Nader or Strom Thurmond level, probably not Theodore Roosevelt and the Bull Moose Party level—Trump third-party bid at about 50%. This is currently seen as putting him in the role of a spoiler, and at the 2016 Presidential election, it most certainly would be, virtually guaranteeing a Clinton victory and possibly Democratic control of the Senate.

In a larger time frame, however, it is easy to imagine an anti-immigrant populist third party emerging with significant influence in Congress and at the state and local level for at least a few election cycles, as that is precisely what we are seeing in Europe. At that level, the relevant comparison would be the French National Front under Marine Le Pen, which has seen considerable success in recent years despite once being thought beyond the fringe. Third parties always have a difficult time in the U.S., which has neither proportional representation nor any sort of transferable vote, but with a strong regional base can have a significant impact.

To a large extent, we’re already in the midst of this experiment with the Tea Party, and per my first point, the Tea Party constituency has tried and failed to have significant influence within the GOP so one can almost argue that they can’t do much worse outside of it. For political consistency with Britain and France, this new party should be called the National Front. People’s Party is good general name for a populist party but, alas, it is used by the Danes, and I’m guessing even a mere whiff of Nordic socialism, even in opposition, would kill it. So it is more likely to be just be called the Tea Party, or the New Tea Party, or possibly even the Trump Party.


At this point, I’m pretty sure Trump is not playing Severus Snape. Trump is Voldemort.[7] Cruz is clearly Lucius Malfoy; Hillary Clinton is Minerva McGonagall; the constantly shape-changing Christie is Remus Lupin. If he uses his massive campaign war-chest to take out Trump in a sacrificial move [8], Jeb Bush will be Albus Dumbledore. Otherwise he is the well-meaning but bumbling half-giant Rubeus Hagrid. This scheme works pretty well, actually, except there’s one major player I can’t readily place: Rubio. Definitely not Harry Potter, though I’m seeing quite a few parallels with Arthur Weasley.[9] 

Beyond the Snark

FP’s  Siobhad O’Grady with a somewhat different take on Trump-as-fascist: Mike Godwin—the Godwin of Godwin’s Law—on the issue: Still another, somewhat inconclusive, discussion of fascism, also agreeing that extra-legal violence is an essential element: Upshot of all this: “fascist” has probably outlived its utility and something more general like “authoritarian populism” would be better.

Later analysis by Ross Douthat on the Trump/Cruz/Rubio finalists—this seems to be the “common wisdom” now—from an ideological perspective: Again noting that Trump isn’t particularly conservative and one could imagine if he’d chosen to move just a bit further left he could have presented a Huey Long-style challenge to the Democrats rather than causing chaos for the Republicans.

Bakunin collectivist anarchism: Not a perfect match, particularly since the Trump supporters are in a post-industrial environment, which explains much of their predicament. The Tea Party is also frequently compared to the various 19th century United States populist movements (, but—unlike Bakunin’s anarchism—these never had a libertarian element, were very interested in national organization, and the industrializing vs post-industrial environment is again huge here.

Rural rebellions: Cliven Bundy: Davidians: Ridge: Though I suppose I don’t really need to provide links for things in Wikipedia.

U.S. young adults too fat for the military:

Issues with polling: This is a pretty good recent summary of the issues specific to the early primaries:

Thomas Edsall on the Trumpistas vs Republican establishment:

Political power of soccer hooligans: for the Egyptian case, start with but there were quite a few other analyses along these lines. David Kilcullen’s Out of the Mountains: The Coming Age of the Urban Guerrilla  discusses this in considerable detail, emphasizing the rather unique set of street-fighting skills imparted by hooliganism. None of this is new: one of the greatest challenges confronted by the early Byzantine Empire were the Nika riots ( instigated by chariot race hooligans. The Emperor Justinian’s response left a bit to be desired in terms of human rights standards, though Trump would presumably approve.   

Trump Winery property and dog killing: The drums and the rooster feathers: let’s just say I made that up—it wouldn’t happen in the 21st century. Would it?

French National Front party: Viktor Orban’s ( Hungarian Fidesz party—now the governing party—is another good comparison here, and Orban is certainly a closer parallel to Trump than Mussolini.

J.K. Rowling thinks Trump is worse than Voldemort:

Kurtz reference: (but you knew that, right?)


1. Quite independently—that is, I wrote my screed prior to seeing this article—the ever-perceptive Ross Douthat reaches much these same conclusions with respect to the limited options of the Republican establishment:

2. I’m old enough to remember when Protestant prayer was a regular part of the public school day. And so, I’d guess, are many of Trump’s supporters, who tend to be older.

3. Consider the position of the US elites (party affiliation is irrelevant here: we’re looking at the denizens of the 158 families, right and left)

  • At the elite level, the country has fully recovered from the Great Recession and other excesses of the Bush era: the Dow-Jones average is almost 50% above its 2008 levels, elite unemployment is effectively zero, inflation is almost too low to measure; for the first time in decades the country is on the path to energy independence.
  • Economic growth is steady and probably at about the highest sustainable level for a mature industrial/service economy. Years of manipulation of the tax laws have insured that the gains from growth go entirely to the economic elite, with none of that unpleasant “trickle-down” from the mid-20th century. De facto taxation of elite income sources—many of which escape taxation altogether—is at levels not seen since the Gilded Age of a century or more in the past.
  • As with economic growth, the elites acquire virtually all of the benefits, and incur virtually none of the costs, of globalization. And there are many. Same for information-technology-driven automation.
  • The education system has been co-opted into a corporate model that has brought merit-based social mobility to a standstill—well below the levels of Europe by many though not all accounts—as well as insuring that in the course of their “education” precious little Jason and Ashley will never encounter people or ideas which will make them feel uncomfortable. But they’ll have access to a great fitness center.
  • Violent crime is on a steady decline (with the consequence that in coastal cities urban property values, another asset of the elite, are hitting stratospheric levels); the legal system has been manipulated to the point where white-collar crime is virtually impossible to prosecute.
  • This system is overseen by a very stable, centrist government disinclined to the ideological and imperial over-reach of previous administrations and under the sway of a corrupt campaign finance system fully endorsed by a Supreme Court controlled by right-wing revisionist judges.

All of which is to say that while the elites pretend to react to the current situation by echoing the dying Kurtz’s “The horror, the horror…” if they have even the slightest self-awareness they should awake each morning pinching themselves and saying “I can’t believe I live in such a wonderful time!”

But even if they aren’t sufficiently self-aware to do that, at the very least they recognize that legislative “paralysis” merely maintains a status quo that, short of formally discarding, on the model of Darth Sidious/Palpatine, the remaining silly trappings of democracy, really couldn’t get any better. What the Tea Party sees as a problem, the elites see as a solution. [10] So ain’t nothing going to change here any time soon.

4. Is the obesity epidemic also the explanation for the decline in street crime?: the correlation is certainly there. Heck, you don’t even need “You can run but you can’t hide” if the miscreant can’t even run in the first place. For this same reason, anyone who thinks the size of the U.S. military can be substantially increased (short of relying on immigrants) is deluding themselves: recruitment quotas are being missed even now. 

5. Though not, it appears, imposing sanctions on the University of North Carolina.

6. He’s still got the Trump Winery curse to deal with. You haven’t heard about the curse?: Local lore has it that the property is cursed thanks to a groundskeeper who a few years back was killing the neighborhood dogs so that the owner—at the time the wealthiest man in the US—could hunt pheasants on the property with his fat-cat friends. This was before Trump bought the place—he wasn’t up there in his black helicopter stalking the poochies with an AR-15 modified for fully automatic fire, well, at least not back then—but killing your neighbors’ dogs is not taken lightly in the hickory-covered hills of central Virginia, and after the dog-killing incident, bad things started happening with people associated with that property, starting with the [natural? Or supernatural?] death of the wealthy owner. Really. Trump apparently was so impressed with his own bargaining skills that he didn’t explore the possibility that the sellers had some really good reasons for wanting to be rid of it.

And there are probably parts of the story we don’t know about: that night when the groundskeeper heard the sound of drums, and saw the ghostly flickering of torches in the distance, and in the morning found the gateposts of the property smeared with chicken blood and black rooster feathers scattered about…yes…in the hickory-covered hills of central Virginia, the unwary can find themselves dealing with forces the likes of which the Trump family cannot even imagine, much less hope to control.

Polling data haven’t factored this in either.

7. It seems J.K. Rowling, who we must acknowledge knows more than a bit about Voldemort, considers Trump worse than Voldemort, but I’m okay sticking with the original equation.

8. Like Dumbledore, Bush is doomed anyway.

9. The astute reader may be observing that this blogger has spent perhaps a bit too much time reading the Harry Potter books.

10. Though it is at least possible that this same set of circumstances may shed light on one of the other great mysterious of the current electoral cycle, Bill O’Reilly’s vicious attack on Ronald Reagan, Killing Reagan, which has been thoroughly denounced as a dark fantasy by virtually everyone who has either worked with or studied the 40th president.

That one of the most pompous bloviators in the Fox multi-verse should compose a book-length character assassination virtually devoid of factual content comes as little surprise: in fact that’s probably a requirement for the position. But targeting Ronald Reagan?—that requires some explanation.

Unless this is a signal—something akin to those scriveners who under Stalin inspired the phrase “Soviet history is very hard to predict”—that the GOP elite are ready to abandon the “Reagan Democrats” before the “Reagan Democrats” abandon the GOP. It was Reagan, after all, who finished the job of changing the affiliations of lower-middle-class whites that Nixon had begun, a task Reagan accomplished both through his not inconsiderable political skills, and by the helpful fact that he wasn’t Nixon. Is O’Reilly trashing Reagan in order to trash this legacy of Reagan?

But that would leave another mystery: how can the GOP win elections only with the votes of aging wealthy white people?—voter suppression is only going to take you so far. Oh, wait, perhaps now we see why the computer code on electronic voting machines is proprietary and can’t be examined…

O’Reilly’s next book will be titled Killing Kittens. Excerpt: “They may look cute, but they grow up to be ruthless predators of songbirds, a vector for the spread of mind-altering viruses and live in total contempt of the humans who ply them daily with food in exchange for a modicum of feigned affection.” Then on to those discredited stories about Dick Cheney, kittens and the wood chipper.

Posted in Politics | 3 Comments

Seven lessons the national Democratic Party should draw from the victory of John Bel Edwards [1]

pdf_iconNovember 22 dawned with the news that Louisiana Democratic gubernatorial candidate John Bel Edwards had not merely defeated the loathsome Republican David Vitter, but totally whomped’em. And accomplished this in the deep South with the votes of a group long written off by the national Democratic establishment, the lower middle class white demographic (LMCWD until someone comes up with a better neutral acronym [2] ).

For reasons elaborated below, the Democratic Party establishment quickly dismissed this as a fluke [3] explained by a spectacularly unsuitable GOP candidate. But the GOP has moved towards “spectacularly unsuitable” as a requirement for candidates!: their motto is “There’s a demolition derby going on, Dad, let me have the car!” With the GOP heading to the hard right, anyone with a lick of sense—or reading Anthony Downs—would know that the Democrats need to make a move for the center right rather than simply further consolidating their existing base.

But that’s not what we’re seeing. So a few unsolicited suggestions on why this should change.

1. Accept the LMCWD as a distinct and embattled cultural minority that should be part of the Democratic coalition.

This comes first because it is going to be hardest. But if the statistical evidence presented in Case and Deaton doesn’t make this point for you, I don’t know what ever will.

In days gone by, this was not a controversial position, even if the connection between the LMCWD and the Democratic establishment was largely mediated by a combination of long-gone industrial unions, urban political machines, and assorted racial arrangements now firmly associated with the contemporary GOP which we most definitely do not want to revive. [4]

Eyes firmly fixed on the rearview mirror, the contemporary Democratic elite has conveniently pigeon-holed the entire LMCWD as a bunch of gun-totting racists with rotting teeth who keep a year’s supply of canned tuna fish and peanut butter in the basement along with a two-year supply of ammunition and are married to their cousins. Granted, such individuals are not entirely hypothetical, and periodically the International Brotherhood of Democratic Party Campaign Consultants rounds up enough for a focus group and sends the resulting video around to scare the hell out of everyone [5] to guarantee:

  • Except for a couple unpleasant months in New Hampshire and Iowa every four years, the consultants will never be required to work more than an hour from a five-star hotel
  • Their base, the NPR Democrats, can continue to hold tightly to their single most valued asset, a smugly refined sense of cultural superiority
  • The consultants can just keep doing the same things they’ve been doing since the Johnson administration.

The abject terror of the Democratic Party establishment to anyone who takes the LMCWD seriously can be seen in their response to the 2010 and 2016 populist senatorial campaign of retired Admiral Joe Sestak in Pennsylvania, where the party machine preferred subjecting the country to six years of ultra-conservative Patrick Toomey to accommodating Sestak. Heck, the Democratic establishment has made it pretty clear they’d support Abu Bakr al-Baghdadi over Sestak. This, I would suggest, is a problem.

2. Send your copy of What’s the Matter with Kansas (WTMWK) to a recycling center.

At the height of the popularity of WTMWK, my University of Kansas [6] colleague Allan Cigler—who had spent his entire career actually studying politics in the state—gave a talk which went through every major hypothesis of the book and demonstrated that it was contradicted by systematic evidence from economic and survey statistics. Facts, how inconvenient. [7]

WTMWK is, of course, little more than the hoary “false consciousness” hypothesis of the Old Left, and more generally yet another indication that the American political system is still in the thrall of three mid-20th century clusters of ideas, the awful Rs: [Franklin] Roosevelt, Reagan and [Ayn] Rand, dead hands of the mid 20th century around the throat of the 21st.

3. Acknowledge that hyper-wonkized government is a serious burden for the LMCWD

The motto of the largely Democratic wonk class is “One person’s bureaucratic bottleneck is another person’s job.” The wonks are barnacles on the ship of state, and their current preferred habitat is with any candidate whose name rhymes with “Clinton.” Which is why it took Barack Obama to create a nationalized health care system that given sufficient time may get us to the level of Bulgaria. Similarly, implementation of the hyper-wonked Dodd-Frank banking legislation began with a 192-page loop-hole ridden form—presumably by now is a couple orders of magnitude more complex—in lieu of merely reinstating the Glass-Steagall Act repealed under—you guessed it—a Clinton, which could have been accomplished with a couple of sentences.

NPR Democrats, of course, are generally in positions where they are coddled by large bureaucratic structures and, in the case of the Democratic elite in their gated communities, have lawyers, lobbyists and accountants on retainer. The LMCWD, in contrast, are likely to be self-employed or in small businesses and have to deal with this ever-increasing complexity and the emerging Indian-style license Raj directly, and get very few benefits from it. Acknowledging this fact would be a major step forward.

4. Address the issues of rural and suburban poverty, and in particular the rural drug epidemic.

Case and Deaton again. [8] The LMCWD has a pretty good idea of what is wrong with their communities, and there is plenty of room for a new indigenous populism of the left, but that would affect some Democratic vested interests like Big Pharma and prison guards.

I’m not sure exactly what these solutions are going to look like, but I’d guess they will

  • Be simple, innovative and decentralized, and will not provide new jobs for the vast armies of wonks and consultancies who have attached themselves to the Clintons
  • Quite a few, though by no means all, will use elements of classical democratic conservatism [9]
  • Quite a few of these ideas, though by no means all, will work

Though as a beginning, follow the suggestions of Paul Krugman, issue a whole lot of very low interest bonds, start up a bunch of long-overdue infrastructure projects and trust me, the talent (and consequent jobs) needed to complete these will be found in the economically distressed counties of rural America, not among the latte-sipping set who spend thirty hours a week in meetings writing mission statements and the remainder updating their Linked-In profiles.

5. Stop trashing religion.

Remember how WTMWK dealt with religion?—Thomas Franks found some bozo who thought he was the Pope. As did his cousin. Trust me, the average church-going Kansan does not believe he or she is the Pope. But the “religion is only for weirdos” sells, big time, with the NPR crowd.

Dealing with religion is going to be complicated: the US is clearly on a decidedly different path than post-Christian Europe, starting with having spent the past fifty years of first liberal Protestants, then conservative evangelicals, deciding—with equivalently disastrous results—that their route to continued relevance was politics, a strategy that consulting some old texts (Matt. 4:8-10; Luke 4:5-8) would have advised them against. Exactly where we go from here is unclear, though I’d suggest it will not be the European model of abandoned churches reduced to art venues and tourist attractions: that required the inflexibility of established religion. [10]

6. Sixth, fire all of the consultants. [11]

The national “Democratic Party” of course, is little more than an illusion perpetrated by a clique of lavishly compensated consultants who live in gated communities rubbing shoulders daily with hedge fund managers and CEOs who complain bitterly that the Wall Street bailouts didn’t go far enough, all supported by 79 or so families with vast reserves of wealth who dabble in politics as a diverting little pasttime rather akin to butterfly collecting. [12]

Which is to say, fundamentally we are in a post-democratic era—the subject of about a dozen future blog entries that have yet to fully congeal—and the oligarchs are just letting us live here. So far. But—the 79 families, humor me for a minute—aren’t those consultants thoroughly ripping you off by leading the country into an ever-more polarized and dysfunctional system that does no one any good? If those people were your landscapers and your lawn looked like a bad case of mange, the re-routed driveway ended in a muddy ditch, and that expensive palm tree they’d recommended you plant—in Minnesota—had mysteriously died, you’d fire them and find someone else, right? And that’s a pretty good metaphor for the current state of American consultant-dominated “politics”, right? So as our overlords, shouldn’t you think about hiring someone new? At least consider it, eh?

7. Rebuild the state and local level parties and stop centralizing power in Washington. And northern Virginia.

I’d guess that the attitude of most Democratic voters towards involvement in state politics is currently “What do you take me for, a complete loser??” But we’re embarking on a twelve-step program here, and one of the premises of twelve-step programs is you’ve got to hit bottom before recovery can begin, and with the LMCWD, the national Democratic Party has certainly satisfied that requirement. So now at any point you can begin the recovery,and you can start by looking at what the Democratic establishment been doing to Joe Sistak and promise to do the opposite. [13] Those strategies aren’t going to come out of Washington, or from the staggering zombie legions of wonks attached to the Clintons, and certainly not from the 79 oligarchic families.

In conclusion…

Once again, the objective here is not to accommodate all of the LMCWD, just the rather sizeable segment who now realize they have been thoroughly screwed over the past fifty years by their allegiance to the GOP, which given the chance will also happily screw them over for another fifty years. If Stanley Greenberg’s analysis in American Ascendant is correct—and Greenberg suspiciously works with facts—the GOP is demographically doomed [14], but at the current pace completing this process will probably take two decades, possibly including some quite unpleasant periods. Accommodate the center of the LMCWD and you reduce that perhaps to ten years, possibly very few unpleasant.

Think big, think 1932, think of a Roosevelt-like ascendency that will last for half a century. Not the entire LMCWD, as you’ll never accommodate people completely absorbed in the Fox/Trump fantasy world. But you don’t need to: just get the reasonable ones and you’ve reestablished a 21st century electoral coalition that can bring about 21st century social democratic policies.

In the configuration of 2014, however, Democrats couldn’t win a gubernatorial election against a man who by 2015 was the most hated governor in the country, behind even Louisiana’s Bobby Jindal. That occurred, of course, in Kansas.

Beyond the Snark [15]

Alec MacGillis (NYT) “Who Turned My Blue State Red” which was the immediate impetus for this:

Anne Case and Angus Deaton, the other impetus: For the original:

Another article on the white drug-overdose epidemic:
Note the observation that blacks and Hispanics aren’t affected because docs won’t prescribe them painkillers: must be great fun when you’ve got metastatic bone cancer…

Stanley Greenberg’s America Ascendent which is the book-length exposition of the “demography is destiny” argument.

Washington Monthly‘s Nancy LeTourneau review of Greenberg:

Dan Balz (Washington Post) on Greenburg, circling around some of these same points as this essay:

Anne-Marie Slaughter and Ben Scott on the increasing irrelevance of Washington think-tanks (Washington Monthly) [and by implication, the zombie wonks: good start, but the situation is even worse…]:

The Economist on Ivy League discrimination:

Mancur Olson on why “bad things happen” when regulations accumulate unchecked:

Plus a shout-out to the recently deceased Douglas North whose work headed in much the same direction:

Anthony Downs, An Economic Theory of Democracy:

The efficacy of decentralized community-based solutions: pretty much everything Elinor Ostrom ever wrote:

Paul Krugman on the wisdom of financing new infrastructure with long-term bonds at extraordinarily low interest rates [16]: well, about every third column he has written in the New York Times for the past seven years:

Ted Robert Gurr 1994 ISA presidential address: But it’s paywalled, and so I can’t tell whether the critique of Huntington, which figured prominently in the lecture Gurr gave at the ISA meeting, made it into the presumably much shorter article. It’s paywalled. That, by the way, is the standard academic mode: write something interesting, give it away—with very few exceptions, academics are not paid for the articles they write; they are paid (sort of) for books—to some rapacious proprietary publisher [insert link to image of Cthulhu here…] who then locks this away so that no one except other academics can read it—though most academics are blissfully unaware that you need access to a research university-level library to read most articles, and think JStore is a public service rather than an extortion racket—then complain bitterly that their ideas are having no influence on the public discourse.

Divergent paths of religious institutions in the US and Europe: Stark-Bainbridge theory of religion: [or more generally, just Google that term.] The “Iron Laws” are mouseCorp, not Stark-Bainbridge.

The 158 families:



Sestak campaign: Get on his mailing list for a running account of everything the Democratic establishment is doing to try to undercut him.


1. So with the worrisome decline in productivity in this blog, mouseCorp—slave drivers!—has decided to impose some discipline. Seven-point blog entries will henceforth be limited (or is that “limited”?) to 1800 words—an overall average of 200 words per point plus 200 each for the introduction and conclusion. A new “Beyond the Snark” section will now be required to give pointers to some of the factual material that underlies…uh…the snark, rather than in-text links which were sometimes useful, but might also just send you off to a picture of Cthulhu. I managed to negotiate—this was tough, but had to be done—unlimited words on the footnotes, but everyone skips those anyway (joke…). There’s a backlog of about a dozen half-completed entries, and we’ll see if this improves things.

2. “Joe and Jane Sixpack,” “single moms,” “trailer trash” and “rednecks” do not qualify.

3. Do I have a shred of evidence to support this claim? No! But we are in the post-modern era and everybody has won and all must have prizes! Okay, so of course I’m lying, but hear me out. (An old Hollywood joke.)

4. Contemporary attitudes towards race relations in those segments of the LMCWD that could be courted by a 21st century Democratic party, while nuanced, are arguably considerably more tolerant, and considerably more meritocratic, than those of the Clintons’ friends on Wall Street, Silicon Valley, and the Ivy League universities.

5. Republican consultants achieve the same level of anxiety using a 30-second video of kittens playing with yarn under a picture of Barack Obama shaking hands with Pope Francis.

6. New motto: one Vitter down, one to go.

7. About the same time Ted Robert Gurr, in an International Studies Association presidential address, did the same to Samuel Huntington’s infamous Clash of Civilizations. Facts, how inconvenient.

And no, the Paris attacks are not validation for Clash of Civilizations: for every unit of effort ISIS has spent attacking the West they’ve probably spent a thousand killing other Moslems and attacking Arab institutions.

8. Hey wonks, that top-down Washington-led “War on Drugs”?—how’s that working for ya?

9. That’s classical conservatism, not to be confused with the bloviator conservatism served up on the Fox Fantasy Hour—same program, merely repeated in 24 daily segments with different hosts—which bears the same relationship to classical conservatism as an elementary school kazoo band does to a virtuoso performance of the Toccota and Fugue in D Minor. A topic to be elaborated upon in a later blog entry.

10. The First Iron Law of U.S. Religious Movements states that the politically dominant religious affiliation changes about every eighty years—Calvinist to Quaker to Methodist to liberal Protestant to conservative evangelical—and as such we are heading towards the next transition. The Second Iron Law is that whatever the dominant sect, Baptists are number two, and Catholics number three. The Third Iron Law is that there is always more going on with new religious movements that the first three groups would like to acknowledge.

11. The original aphorism in Henry The Sixth, Part 2 Act 4, scene 2 alas, has some human rights issues.

12. The “Republican Party”, of course, has precisely the same structure, albeit with a different if overlapping set of families, and everyone owns guns. Or claims they do.

13. More generally, for negative policy examples, particularly those involving the sale of beer and wine, and the management of college athletic programs, it’s really hard to beat Pennsylvania. I digress.

14. Probably, but not necessarily. The GOP of Trump, Carson, Cruz, Bachmann, Huckabee and Palin is doomed. For a different but equally compelling set of reasons, so is the GOP of Brownback, Walker, Christie and Jindal. The GOP of McCain, Romney, Kaisch, Bloomberg, the Bush dynasty and Rubio probably is not doomed, and the GOP of Landon, Eisenhower, Dole and Kassebaum would not have gotten into this mess in the first place.

15. Okay, so this is a new experiment to try to get more of these blogs out the door. You see, these usually start when I’ve run into a series of related articles that get me writing. But once it gets going, Krans, the Demon of Snark, takes over and we end up with, well, we end up with these blogs. So I’m going to try a new section—and not just links and footnotes, particularly since the footnotes are usually be even worse than the body of the text—providing pointers to the serious stuff as well. We’ll see how this goes.

16. The wonks and think tanks ain’t signing on to this imminently rational proposal, presumably at the behest of their paymasters on Wall Street and in the 79 families, who have absolutely nothing to benefit from either the infrastructure—they live in gated communities and fly NetJets, remember?—or low-interest long-term bonds.

Posted in Politics, The Blog | 3 Comments

A Field Guide to Millennials and Gen-Xers in Social Data Analytics


Background: This was vaguely solicited advice to a funding agency which, exercising the usual discretion characteristic of this site, shall remain anonymous. [1] Hence the organization into ten points rather than the usual seven.

1. They are digital natives: You cannot manage them unless you speak that language, and fairly fluently at that. They will instantly detect posers in this domain.

2. They are very social and travel in non-exclusive herds or, as they prefer, tribes. They are innately collaborative and remarkably adept at self-organization, including long-distance collaboration.

3. Consistent with their social ethos, they share and expect others to share. In the data analytic world, if you aren’t on GitHub, you might as well not exist.

4. Observational evidence suggests that they can survive a total of between three to ten hours of exposure to the 200 yellow-on-green, 8-point-type PowerPoint slide presentations that characterize monthly program reviews. Ethical constraints have precluded establishing this number precisely, though they will usually respond to such treatments by fleeing the project—see Point #10—rather than clawing their own eyes out.

5. They are fearless adopters, assessors and modifiers of new technology. Contrary to stereotypes, when properly motivated they have a remarkable capacity for work: A millennial working on the early stages of one project I was involved with ended up in the emergency room due to dehydration after a night of data-wrangling. He survived, and now teaches at Princeton.

6. They like feedback, intermediate rewards and have a possibly overly acute—though generally accurate—sense of injustice when assessing organizational management.

7. Unlike the notoriously sexist first generation of political methodologists (and the hopelessly sexist game theorists before them [2]) they are fully open to participation of, and leadership by, women. In fact they find exclusively male environments rather odd and alienating.

8. They prefer to approach their work with a sense of humor: at the recent New Directions in Text as Data conference, the first slide of the first presentation was a drawing of a squirrel with a martini glass [3]. The presenter, a woman, said “Every presentation at this [predominantly Millennial and GenX-er] conference needs to have a picture of an animal.” This had not been announced in advance, but indeed every subsequent presentation contained a picture of an animal.[4]

9. They are skeptics, and if someone tells them something is impossible when they know it has already been done—the forte of physicists and engineers talking about social science research—you will lose them immediately. They are also skeptical of each other: the recently established Millennial and Gen-X journal Research and Politics has the strongest replication norms in political science, and quite likely anywhere in the social sciences..

10. Every reasonably-sized data analytics company in the world has at least half a dozen openings: a Millennial’s alternative to working for your project is to work for Google, Apple, Amazon, Facebook or Microsoft. Not Starbucks.

A Gen-Xer social data analytics researcher I know seeking to escape an academic institution in a rather remote village interviewed with each of these five, and within weeks had offers from every one; she went with Microsoft Research in New York City. Another, who had taken several graduate courses from her institution’s statistics department, concluded that the methods required for publication in a certain social science were completely useless, so she quit and after a few months at a data start-up, ended up at Apple, in a city which is probably at the outer limits of the possibility curve on the dimensions “coolness” and “affordable.” A final example—though I have more—involved a student with considerable experience in conflict data analytics who took a summer internship at an insurance company where he wrote a little model to predict the location of pirate attacks, which the company then used to secure a contract with the world’s largest shipping company. You will be shocked, shocked to learn that he has an offer for non-academic employment as well. Also in what has been considered one of the world’s coolest cities. During the time of the Roman Empire. Before that unpleasantness with the Saxons. I digress.


1. Which is to say, blindingly obvious to almost everyone likely to read this. It is not Penn State.

2. Where various personality disorders at the clinical level also seemed to correlate with professional success, and I’m not just thinking of John Nash. Though apparently John von Neumann was a pretty nice guy.

3. It made sense in context…well, sort of… The martini had an acorn rather than an olive. I do not know whether the martini was shaken or stirred: it wasn’t that kind of squirrel.

4. My contribution (taken at a workshop I attended in South Africa):blackswan


Posted in Uncategorized | 2 Comments

Is Trump pulling a Colbert on the Republican Party?

pdf_iconAs a Will Rogers Democrat [1], I’ve been watching with delight as Donald Trump states, re-states, and then doubles-down on statements that seem almost perfectly designed to offend the median voter and put other Republican presidential hopefuls—now a not insignificant percentage of the entire party—in uncomfortable positions. The whole thing is almost too good to be true.

Then in the fading wakefulness of last evening, as the brain starts making random connections that are both crazy and creative, it hit me in a flash: what if it is too good to be true?! [2]

That is, what if Trump is pursuing the Stephen Colbert strategy of setting himself up as a parody of the looney right, and simply seeing how far he can go? Consider the following seven characteristics that point in this direction.

1. As has been pointed out in numerous forums [3], for most of his life Trump has been not only a Democrat, but as Dana Milbank has pointed out, a fairly progressive Democrat at that.

2. His “born again Republican” schtick does not have any obvious motivation—nothing equivalent to Ronald Reagan’s battles with Communist-influenced unions in Hollywood—and he flipped almost immediately into an extreme position.

3. His positions, first as a “birther” and now as a combination of an anti-immigrant know-nothing, Putinesque Great Dictator and Green Lantern seem almost optimally designed to embarrass the GOP by focusing attention on the looniest of the loony ideas floating therein. To date he is completely ignoring Republican leadership appeals to stop the madness.

4. What, exactly, is “Trumpism”?: it seems to change by the day but in addition to overt racism, includes such howlers as secret plans to impose his will on a country holding $1.2-trillion in US debt. In contrast, the libertarianism—if that’s what it still is—of Rand Paul may seem a bit fringe, but at least it is a coherent  ideology with a long political and intellectual history.

5. No one gives Trump even a remote chance of winning a general election, however successful he might be in primary elections in small states.[5] But he could certainly thoroughly disrupt those already problematic Republican primary debates.

6. Trump may or may not be threatening to run a Ross-Perot-like third-party candidacy if—which is to say, when—he doesn’t get the GOP nomination. This would almost certainly throw the electoral votes of Michigan, Ohio, Pennsylvania, Florida, Virginia and possibly North Carolina to the Democrats, at which point we could save some time and effort by skipping the election altogether and just appointing Hillary.

7. Like Colbert, Trump is a television entertainer: let us not forget that.

Trump, bluster notwithstanding, seems to be paying at least some price for these antics,[4] though of course if he is worth any significant fraction of what he claims, he is nowhere near the poverty level, and in the meantime it appears that he will be providing lucrative employment for legions of contract lawyers.

But whether or not it is an act, this is going to be a tough tiger to dismount. And if Trump is, in fact, doing this—with thus far devastating effectiveness—as a favor to Hillary Clinton and the Democratic Party, probably the best comparison is not to Colbert, but to the undercover but nonetheless heroic and self-sacrificing Severus Snape. Hey, they already appear to share the same hair stylist!

9 Dec 2015: For the update, five months out, see


1. “I am not a member of any organized political party. I am a Democrat.”

2. The conservative Washington Times—I’m not a regular reader, and found this only as the result of Google—is apparently also having some of these same doubts.

3. Google “donald trump democratic donations” for a long list of essays to this effect.

4. To say nothing of the price his employees must be paying: the thoroughly loathed “Trump Winery” is just a few miles from here and I’m certain few if any of the people working there expected to be signing on for this sort of thing.

5. Though this is quite a recent development, as Chris Cillizza pointed out just three weeks ago.

Posted in Politics | 2 Comments

Seven reasons I probably can’t help you get my open source software running on your computer

pdf_iconThe EL:DIABLO/PETRARCH system is beginning to get some traction, and with this, we are starting to get increasing numbers of emails asking for assistance. In a few instances, these can be resolved and in a very, very few instances, they have alerted us to important bugs. However—there’s probably a rule of thumb here—if an issue can’t be resolved in two or three email exchanges, it usually cannot be resolved at all.

This, I am sure, ends up being very frustrating to the person who originally sought the assistance, and it almost certainly leads to some variant on the following generalization about open source in general and programmers in particular: Open source is too good to be true, and programmers are a bunch of introverted overpaid arrogant bastards with inadequate personal hygiene who deliberately obfuscate their work to make it seem more difficult than it really is even as they pretend to provide it to you for free. Working on a script?: Have a programmer eviscerated by a velociraptor, crushed in a trash compactor, or reduced to Valyrian dragon chow.[1] All of these are sure crowd-pleasers.

Well, it’s actually a little more complicated than that, and hence I am devoting a bit of time—also provided for free—to explaining why.

1. No two research computers are identical

For purposes of this discussion, we will stipulate that the software you are trying to use works when correctly installed. In the case of EL:DI/PETR, we’ve been running the system 24/7 for about six months and have encountered a single untrapped bug (which we corrected, or rather, trapped). The major open source projects—R, LaTeX, perl, Python, NumPy—have accumulated millions of hours of use on tens of thousands of installations and thus, you can reasonably assume, also work.[2]

So why won’t it work on your computer?

In the absence of access to your machine—and please, don’t ever, ever grant access to your machine to someone you just know via the internet—the basic answer is “we haven’t a clue.” In situations where one is experimenting with open source software—as opposed, for example, to a standardized telemarketing operation selling the rights, smirk, smirk, to gold bars to protect you from the massive hyperinflation of 2009-2011 that destroyed the value of the US dollar[3]—it is safe to assume that on the software side, no two computers are identical. I don’t know what else you have on your machine, or how it is installed: In a complex system like Python, or even EL:DI/PETR, there are multiple ways of doing this, and they all work, but they work differently. I also don’t know your directory structures, and probably nine times out of ten, that’s where the problem is.

It is even worse when you are in an institutional setting since your machine may be deliberately configured, for good reasons or bad, so that it is impossible to run the software. I’ve known institutional IT specialists I would trust to do almost anything correctly, and I’ve known some who would have difficulty operating a vending machine.[3] Again, I have essentially no information on how your machine is configured, nor any ethical way of obtaining that information. This is a problem.

A Story: Last summer I spent the better part of two weeks trying, via email, to help a graduate student in Europe get TABARI running on a Linux installation. I’ve run TABARI on dozens of Linux systems, nothing new there, but hadn’t seen this issue, and could not debug it remotely: The effort was unsuccessful.

A few weeks later I upgraded my software to the latest version of the GNU compiler suite, and suddenly TABARI would not compile for me either. In fifteen minutes, I’d traced the issue to a simple problem in the ‘make’ file, and moving a single file name from the beginning of a list to the end solved everything. I’m sure changing the ‘make’ utility made sense to someone somewhere, and probably even makes the whole GNU suite more consistent, but it broke this particular piece of code. The person I was trying to help happened to be just a bit ahead of the curve in getting an up-to-date installation.

Email exchanges, by the way, are particularly inefficient because of the lag time: I solved the problem in fifteen minutes because I could run however many experiments—dozens, typically—each taking a few seconds (try this, didn’t work; try that, didn’t work, rinse and repeat). Email: we’re talking weeks.

2. You shouldn’t be using open source software unless you accept the core elements of the social contract of open source.

Open source software is only “free as in puppy.” Entire books have been written about why open source works, and let me say that for most programmers, particularly those working independently who have to provide their own tools, it has been an utter and complete game changer, about as far from “too good to be true” as you can imagine. Others have written more extensively on the implicit rules of open source software, but for our purposes I’d emphasize:

  1. We have provided you with every single line of code so there is no possible issue with the software that you cannot, given sufficient knowledge and effort, figure out.
  2. In addition to whatever intrinsic rewards we may obtain [4], we are posting this code because we want it to be improved: steel sharpens steel.
  3. As with every complex social system involving coordination for a common good [6] working effectively in the open source world involves internalizing and following a set of cultural norms.

Or rather “sets”, as there are multiple cultures, for example those of Python, javascript and R. The most obvious difference is in the skill set the user is presumed to have already mastered but there are others: for example I think generally web-based programming forums such as those for javascript and CSS are noticeably more polite and social than Linux or C forums.

A Story: Ever tried playing in a bluegrass, Irish or old-time “jam”, those informal gatherings of musicians of multiple skill levels? These typically start out really easy, with slow, familiar tunes, lots of open chords, and pretty much everyone can contribute. But then a point comes when the banjo player is channeling Earl Scruggs and the mandolin player Bill Monroe and the fiddler must, indeed, have made a deal directly with Satan (Satanic fiddlers, I repeat myself). At that point, unless you have had a heck of a lot of practice, there is no way you can possibly keep up and the best thing to do is just put down your instrument and enjoy the show.[7]

A really good musician can work in multiple genres, but a medium level bluegrass player is not going to immediately become a medium-level Irish player, or vice-versa. The fact that you are really skilled in R doesn’t mean you’ve got a comparable level of skill in Python, or vice versa, though you are probably well on the way to getting there. But there is a still a learning curve and at first you are going to get hung up on some really simple things.

Open source is like that. There are levels where everyone can play: certainly you should be able to do “Hello, World!” in pretty much any language, and then as you work your way up things get harder—say to the Python statistical data management system Pandas, which appears to be channeling R, and quite possibly Satan—at some point the software will get beyond your skill level. For example despite almost 50 years of programming experience [8] , I would not even begin to think I could touch the Linux kernel code. EL:DI/PETR, with three or four complex parts, and multiple authors, is heading towards the upper range: it isn’t the Linux kernel, but getting it fully deployed is a lot complicated than deploying its predecessor, TABARI. [9]

3. Wizardry is hard.

One of the many things I like about J.K. Rowling’s Harry Potter series is that learning to be a wizard is hard work: potions can take days to concoct, and still fail, and there are spells that take years to master and some people never really figure them out.

Yep, sounds a lot like programming to me, which is probably why most programmers really like Harry Potter.

Not everything is hard, and actually, you never know for sure just what will be hard. Sometimes things work right away; sometimes they can take hours to get it working even when the final correction is just a couple lines—or characters—of code. That was certainly my experience with installing one of the major Python packages—I believe it was NumPy—which after several hours I finally traced to having incorrectly installed something earlier. If you are willing to put in that sort of work, open source provides an astonishing set of opportunities. If you aren’t, at some point it isn’t going to work for you.

A Story: You’ve all heard the ca. 1955 three rules for making it through life, right?

  • never eat at a place named Mom’s
  • never play poker with a man named Doc
  • never get into bed with someone crazier than you are [10]

Here’s the analogue for finding a good programmer:

  • Programming is 10% book learning and 90% experience
  • The programmers who you might hire will vary by at least a factor of ten in terms of the amount of time it will take them to complete a task; the very best could have an edge of one-hundred, and a remarkable number of people who say they can do the job will not be able to finish it at all
  • Like competent craftspeople in every place and age, they are in short supply: there are never enough good programmers for the tasks requiring a good programmer [11]

This is not to discourage people from learning programming, starting with the 50% of the population who were commonly programmers when I first learned FORTRAN [12] but currently are discouraged from entering the field. But it will take quite an investment of time, particularly with the complexity of current environments, and, in my experience, at least some aptitude for the task.[13] However, the resources now available are vast and very accessible and in the end you will find that most skillful programmers are not merely introverted arrogant bastards best suited for dragon chow. A few aren’t even introverted! The pay and independence isn’t bad either.

4. Critical thinking is great, but problem solving skills are even better.

Some instructional programs do this really well, and others do not. More generally, to the massive frustration of those who will not rest until they’ve increased income inequality to the level of the late Roman Empire [14] programming remains a craft, not a simple set of routine tasks: You learn a set of tools but then it can take years to figure out how to best apply them.

As for assuming that learning programming can’t be all that hard and anyone can do it—and for some tasks, it isn’t that hard, and if that’s your situation, ignore the following—let’s go back to the musical analogy:[15] You’re putting together a bluegrass band, and you need a banjo player. Which strategy do you think is going to have a better outcome?: call your worthless nephew Bob and say “Bob, stop smoking that joint and go learn to play the banjo.” Or find someone who already knows how to play the banjo—ideally, bluegrass banjo—who you can persuade to join the band.

A Story: A while back I was involved on the edges of a small construction project with a guy who has been doing carpentry for about as long as I’ve been doing programming. There were no specific plans, just a building site, a load of lumber, and—definitely—a nail gun. As I watched the guy at work, I realized that the difference between how he did the project and how I would have done the same thing is that he was always thinking three or four steps ahead, and, out of long experience, doing things that would prevent problems further down the line. That’s a pretty good definition of an expert, and much of the reason an expert can do things faster and more effectively than someone just watching Home Depot videos on YouTube. In a few hours that pile of lumber was a woodshed, and I suspect it will be standing longer than I will.

5. Software is neither static nor, particularly across systems, logically consistent.

Software, particularly open source, is organic and evolves, but for that same reason, it exists in a series of punctuated equilibria. Once something works it will probably be maintained, until it isn’t. To function in this environment, you need the twenty-first century equivalent of horse sense, which works because while half the software one is using probably didn’t exist five years ago, other parts existed forty years ago.

A Story. About three weeks ago we noticed that the big black plastic wireless device in our guest bedroom that exists primarily to consume extortionately-priced ink cartridges was now simply an inert big black plastic brick providing the final resting place for stink bugs. We had changed nothing: it just stopped responding. Per the haiku that I believe was the motto of the Windows development team during the “Blue Screen of Death” decades:

Your computer was fine
But now it will do nothing
Sad, so very sad

I went onto the web and found we were definitely not the only people on the planet who were unexpectedly having this experience, and tried various of the suggested solutions to no avail. In the end, I simply unplugged both the printer and our wireless router, plugged them in again, and everything worked.

Computers are like that.

6. Simple is really hard, and really expensive. So is documentation.

Though if the software comes with documentation, you should at least read it. Yes, yes, we know that you didn’t need to read the documentation when you bought a system from one of those companies that has more money in their [off-shore, tax-sheltered] bank account than is held by the US Treasury, but a lot of open-source software was not written by such companies.[16] And virtually all open source software was written to solve sets of problems which may well not be precisely the same set of problems you are trying to solve.

To take our proximate case, EL:DI/PETR is already at a level of complexity where I don’t know the purpose of every line of code. We still don’t have a large developer community, but the system involves very substantial development by not only me, but by John Beieler.[17] Do I understand all of Beieler’s code?: no, I just know that it works. Neither Beieler nor I even begin to understand all of the code in Stanford’s CoreNLP suite. Which also works. Usually.

I actually did understand every line of code in TABARI, but that was a ca. 2000 program with only limited open source involvement. And even with TABARI, there is a rule of thumb that any code you wrote more than six months ago might as well have been written by another person. It takes a couple of hours for me to get back into the code in any significant part of PETR, the last time I was doing serious programming on it was four months ago, and in the intervening period I’ve worked on three or four other projects of comparable complexity.

A Story: I will not provide a story here: go read the documentation. Including the comments in the code. And if you find something wrong with the documentation, at least tell us, but better yet, fix it.

7. I’m not working for you

You are probably being rewarded—whether with hourly pay, a stipend, class credit, the prospect of an M.A. or Ph.D. thesis, or whatever—for getting the open source software to run. I am not, and I’ve already gone the extra step of making the code available with at least some level of documentation.

So, I know what you are thinking: it’s that old open-source bait-and-switch! I’m only pretending to provide the software but in fact in order to actually use it, you have to pay me. Arrogant bastard programmers! Dragon chow! Dragon chow!

Again, it’s a little more complicated than that.

It is true that, as with every contract programmer—and I’d extend this to the many small development groups with only a single level of management, though not to those working for corporations with bank accounts, or VC funding, exceeding the currency reserves of most nation-states—I only get to eat what I kill. So to speak. And contrary to the popular mythology about open source, most is actually written by professional programmers.[18]

But that doesn’t mean I want to work for you. For starters, wrapping this entire essay back to Point #1 [19], I probably can’t help you unless I can put my hands on your machine.[20] Which is not terribly practical if you are on the other side of the country. Or the planet.

Once I’ve got that access, assuming we’ve eliminated the problems that could be resolved by reading the documentation, getting the software to work would probably take me anywhere from fifteen minutes to a couple hours, but, returning again to Point #1, I have absolutely no way of predicting what amount of time will be involved. For that amount of work, it still makes little sense for you to hire me.[21]

In the meantime, the time I’m spending doing what is really your job is taking away from the time I have available for projects where I have an on-going relationship with the client and I’d really like to deliver them a quality product more or less when I said I would. Responding to a probably impossible email request does not contribute to that end.

So yes, if you can’t figure out the software, you probably need to hire a programmer, but a local programmer, and someone with administrative access to your equipment, and probably someone you will be asking to do work on more than one occasion. Because they are unfamiliar with EL:DI/PETR, they probably won’t be able to solve the problem in fifteen minutes, though they probably will be able to solve it. If no one with that skill set exists in your vicinity, I’d say you’ve just found a business opportunity. But not one that I can solve. I have promises to keep and, indeed, miles to go before I sleep.

A Story: My wife took on the task of getting our ComCast account moved from State College to Charlottesville, a process involving roughly the same degree of complexity and effort as the Iranian nuclear program negotiations. After one particularly extended—if futile—conversation with a ComCast rep, she got him to say where he was located. “I’m in the Philippines. The rainy season is coming, and my bosses are all really crabby.” We eventually got the system working, then squirrels ate through the coaxial cable.[22]


1. This occurs in Season Six: George R. R. Martin is going kinda slow writing the books so the script writers are going to need to get kinda creative.

2. One of the reasons I’ve switched to working almost entirely with open source software is I was finding more bugs in the propreitary software I was working with than in the open source.

3. You remember the dramatic dollar hyper-inflation of 2009-2011, right? When liberals who had foolishly ignored the copious warnings on Fox and conservative talk radio were reduced to using rolled-up Obama posters to kill rats for food?

4. What is printed on the bottom of cans of RedBull sold at Microsoft [substitute Apple or Google per your preference]? “Open other end.”

Though this is nothing compared to the IT people employed at the lower levels in academia, where salaries are typically not competitive with the private sector and, as the saying goes, they usually don’t get the sharpest crayons in box. In university research centers, this is less of a problem, since the attractiveness of working on a variety of problems and [sometimes] with state-of-the-art equipment and software compensates. But if the job is routine and doesn’t pay competitively, you’re probably looking at a pretty much continuous mess’o’trouble.

5. The ego boosts are real: I still remember the first time, this when open source was still fairly new, when I saw that a rather esoteric program—it implemented the algorithm for fitting hidden Markov models—that I’d converted from someone else’s open source code in C to Pascal (or maybe it was the other way around) was being used by some electrical engineers in a publication. And about the same time, some undergraduate at Stanford cleaned up the TABARI code so it no longer generated warnings when compiled on Linux.

6. Which is to say, every human system except those postulated by “rational choice” economists and political scientists, who like Alice in Wonderland’s Red Queen can only thrive in their tenured sinecures by learning to believe six impossible things before breakfast. I digress. Though it isn’t just six.

7. In the EL:DI/PETR world: just use the data.

8. I exaggerate: it’s really only been 48 years.

9. The code in TABARI, written in C/C++, is substantially more complicated, which is also why TABARI is about fifteen times faster than PETRARCH, but the full system in EL:DI/PETR is more complex. And of course does a lot more.

10. There’s a fourth: if you can’t identify the sucker at a poker table, it’s you.

11. Consequently, if someone with an active research program suggests you hire one of their students…

12. “Ada Lovelace wrote the first loop. I will never forget that. None of us ever will.” Admiral Grace Hopper.

13. Luthier Wayne Henderson, channelling a Victorian-era joke, says that all you need to do to make a guitar is take a bunch of wood and a pocket knife and carve away everything that doesn’t look like a guitar. There are times I feel like something along those lines can happen, in reverse, with a program.

That is, sometimes I’ll sketch a design, but particularly on smaller projects where I’m unlikely to re-use the program, I’ll just sit down and start writing code, not necessarily consciously knowing why, but confident that at some point I’m going to need that code because I’ve written similar programs many times in the past. So I suppose, following Henderson, a lot of what I do is just take an empty text file and keep adding chunks of code until it works like a program.

But don’t try this at home.

14. Actually, it’s already at that level, but that’s for another discussion.

15. Yes, I see another blog entry emerging here…

16. Though quite a bit is.

17. On GitHub, you can see exactly who contributed what.

18. Who is a professional programmer? If that’s in your job description, it’s an easy determination. But if, like me, you’ve done contract work on a variety of tasks—my degrees are in mathematics and political science, not computer science, which wasn’t even a major when I was an undergraduate—it comes along more gradually. In such a situation, I’d say being a “professional” means is getting hired, on multiple occasions, to write software someone else is going to use for their work. Quite a few years ago I started adding up how much I’d been paid over time for writing such software and realized it came to multiple hundreds of thousands of dollars, at which point I figured I was a “professional.”

Once again, there is a good analogy with musicians. You’re an orchestra or studio musician with a union card: sure. But you start out playing for fun, then for beer money, and then at weddings, then you record a CD and it sells: when does the transition occur?

19. Or, in musical theory, resolving the essay to the opening key. A pretty good technique in blogs, it turns out.

20. Not in the sense of Oral Roberts, though at times that appears to help as well.

21. Which is to recall a particularly notorious tax form that Pennsylvania requires of LLCs which is so bad—it runs some ten pages, and I swear probably contains a provision absolving any corporation which provides free natural gas to kitchen faucets  in northwest Pennsylvania of all tax liabilities—that accountants refuse to do it. I asked. I did my best, and concluded I owed about $250 only to receive a notice some eight months later that I’d done the calculations incorrectly—certainly no surprise there, as the 38 pages of instructions appear to be a bad translation from proto-Hittite [23]—and actually owed only $21. No refund check, however: that’s probably a twenty-page form. Pennsylvania: the state where the only task the government performs competently is incentivizing citizens to purchase wine in New Jersey, Maryland and Ohio.

22. This story, you will notice, has nothing to do with the rest of the entry, but mouseCorp—slave-drivers—pays me by the word. Blogs are like that.

23. There really are 38 pages of instructions. But they weren’t translated from proto-Hittite but rather they written in something far worse: fluently crafted lobbyistian. Which is to say, most of that form exists to provide massive tax breaks to a tiny number of individuals, but this is quite deliberately written in a fashion that makes it nearly impossible to figure out who that beneficiary is. And as part of the game, the beneficiary will, of course, be constantly complaining about how unfair the tax system is.

And, well, it’s pretty unlikely that going to change: here’s a nice article on why you are never going to see the IRS provide any user-friendly software and—I’m shocked, shocked—lobbying money is involved. And it co-occurs with an article on the privatization of security—that old Weberian “sovereign monopoly on the legitimate use of force” was sooo twentieth-century—and presumably in due order we will see the slaughtering of unarmed black men out-sourced, with the help of Erik Prince, to Chinese “security” guards operating with impunity. And not just in Africa. Suggesting, once again, that we’ve moved into a system that looks a heck of a lot more like Rome under the Borgias than Wisconsin under the La Follettes. A topic for a later set of entries.

Posted in Uncategorized | Leave a comment

Seven observations on the newly released ICEWS data

pdf_iconBefore we get to the topic of the post, the usual set of apologies about the absence of recent postings—starting with that Duke Nukem Forever style “Feral+well, whatever.” Isn’t that I’ve dropped out, it is rather that I’ve been too busy with other projects. And by the way, I haven’t retired,[1] unless logging 2,200 hours of work last calendar year is “retired.” Someday, things will slow down, I’ll get to the backlog. But enough about me…that’s not what we’re up to today.

Instead, the point of today’s posting is to comment upon the long, long awaited release of a public version of the Integrated Conflict Early Warning System (ICEWS) dataset, which appeared without fanfare on Dataverse late in the afternoon on Friday, 27 March, with Jay Ulfelder probably responsible for first spotting it.

This is a massive resource: The investment in the ICEWS project, albeit not all of the funding going into the data, is probably roughly equalled the whole of NSF spending on all international relations and comparative politics research during the time it was active. As with any large data set it is going to take a while to figure out all of its quirks. The ever-resourceful David Masad already has some excellent instructions and initial analyses and visualizations here, and I’m sure more will be forthcoming in coming weeks—and years—but I wanted to use the occasion to alert my [ever declining] readership to this, and provide some initial observations.

1. It exists!

Long overdue, to be sure, given that at the ICEWS “Kick-off Meeting” in 2007 we were assured that everything in the project would be open, a concept almost immediately quashed, and then some, by the prime contractors. I’m pretty sure we have the persistent and unrelenting efforts of Mike Ward and Philippe Loustaunau to thank for the release. I’ve also got a pretty good idea of who is responsible for the delays but, well, let’s just focus on positive things right now.[2]

Here’s the link to the data:  There are actually four “studies” involved

  • 28075: This is the main data set: 26 files, most of these are around 30 Mb each. Took me a couple tries to get some of them, though that may have been due to a lousy wireless connection on my end. It’s Dataverse; it will work.
  • 28117: Aggregated data: this may the quickest way to get into the data, assuming what you are interested in is covered by one of the very large number of aggregations that have already been computed. I’ve not really looked at these yet, and much of the documentation appears oriented to a proprietary dashboard which is not provided, but particularly for people not comfortable working with very large disaggregated data sets, it could be very useful.
  •  28118:These are the dictionaries, more on this below.
  • 27119: This was the big disappointment: we had been told the release would include a set of “gold standard cases”, which we assumed would be the much-needed gold standard cases needed to validate event coding systems, but these, alas, are just some sort of esoteric records associated with the much-disputed ICEWS “events of interest.” Hard to imagine it will be much use for anyone, but I’ve been wrong before.

2. Dictionaries!

As we’ve [3] been arguing for decades, the primary advantage of automated coding is the ability to maintain consistent coding across data sets being coded across a number of years and, through the dictionaries, to have a high level of transparency.[4] For that you need the dictionaries as well as the coder, and the KEDS project [5] has been consistently providing those as part of the data. To its great credit, ICEWS has followed this norm, and what dictionaries they are!: a primary actor dictionary with over 100,000 political actors. The format is derivative of that used in TABARI and PETRARCH—I’m guessing it will take about fifteen minutes to write a converter.

The agent dictionary—also provided, along with a somewhat cryptic “sectors” dictionary— on the other hand, is definitely a work in progress, though probably fine for the major sub-national actors, which for the most part are still those established by Doug and Joe Bond’s work on PANDA and IDEA back in the 1990s, and subsequently incorporated into CAMEO. The quirk that really sticks out for me is the treatment of religion: for ICEWS, it seems “Christian” and “Catholic” are separate primary categories [6]—granted, the late Ian Paisley [7] would agree—and the Great Schism of 1054 apparently was no big deal. The whole of Judaism gets only two entries, rather an oversimplification for the neck of the woods I’m usually studying. There is an extraordinarily eclectic set of ethnic groups—with a distinct oversampling in India—and, well, overall, the agent dictionary is sort of like rummaging through some old trunk in your grandparent’s attic [8], and I’m pretty sure we’re well ahead of this at the Open Event Data Alliance.

ICEWS did not provide event code dictionaries, which are presumably tightly linked with the proprietary BBN ACCENT coder. This is a bit of an issue, since ACCENT does not actually code CAMEO, but their own variant which they have documented in a very extensive manual. Not ideal but no worse than the situation with any human-coded data.

3. You’ll need to convert it for statistical analysis but I’ve got a program for that.

The public-release ICEWS uses a very quirky format that is apparently designed to be read rather than analyzed, as the underlying codes are presented in verbose, English-language equivalents. Unless you are ready to settle into a few quiet evenings reading through the 5-million records, you’ll probably want to use the data in statistical analyses, which means you’ll want to get shorter codes. It just so happens, I’ve got an open-source program for that at and I’ve even provided you’all with COW codes as well as ISO-3166-alpha3 codes. I didn’t fully convert the sector dictionaries, but this will at least give you a good start.

4. Massive use of local sources

That old criticism that event data are nothing but the world as viewed from the point of Western imperialists? This will be hard to sustain with ICEWS, which uses hundreds of local sources, and each event contains information on the source. I’ve only looked in detail at 2013, and here these follow more or less a rank-size distribution, with some of the major international sources (Xinhua, BBC) being major contributors, but the tail of that distribution is extremely long.

5. The distribution is flat

While the internet, and new social media more generally, are revolutionizing our ability to inexpensively generate large-scale datasets relevant to the study of political behavior, a serious problem has been dealing with the exogenous effects of the rapid expansion of internet-based sources that began in the mid-2000s. Any “dumpster diving the web” approach leads to an exponential increase starting about this time, which for any statistical analysis is a bug, not a feature.

ICEWS avoids this: they seem to be using a relatively fixed set of sources, and the total density is largely flat. As Masad’s visualizations and some others I’ve seen show, there appears to be a bit of variation—1995 and 1996 seem undersampled—and more will probably appear as further research is done, since there have been major changes in the international journalism environment beyond just the increase in the availability of reports, but these variations are not exponential, and can probably be accommodated with relatively simple statistical adjustments.

6. 80% precision, but no assessment of the accuracy

The release is accompanied by an extensive analysis showing that the “accuracy” of the ACCENT coder is around 80%. Which would be very nice, except that the study actually assesses not accuracy, but precision, which, while interesting, gives us no information whatsoever on the measure most people are interested in: the probability of correctly coding a randomly chosen sentence (accuracy), rather than the probability that a sentence that was coded was coded correctly (precision). Echoing the exchange between Col. Harry Summers and one of his Vietnamese counterparts over the unbroken string of US battlefield successes, the assessed precision “May be so, but it is also irrelevant.”

The arguments here are a bit technical, though involve nothing more than simple algebra, so I’ve relegated this to an appendix to this post. The upshot, to paraphrase Ray Stevens, “Yo selected on the dependent variable, and I can hear yo’ mama sayin’, “You in a heap o’ trouble son, now just look what you’ve done””

7. It should splice with Phoenix

The current dataset has a one-year embargo, though the word on the street is that the embargo will remain at just one year, more or less . That is, the data will be periodically updated, ideally monthly, perhaps quarterly. [Addendum: in a very promising sign, the March 2014 data were indeed made available on 1 April 2015.] This will be adequate for most retrospective studies, but still won’t help with the real-time forecasting that event data are increasingly used for.

Here the recently-released OEDA Phoenix data set comes to the rescue, or will once we’ve got another four or five months of ICEWS data, as Phoenix gets going around the beginning of July 2014. Provided ICEWS is updated regularly, within a fairly short period of time one should be able to use the ICEWS 1995-2014 data for calibration, and then use Phoenix to cover the end of ICEWS to the present (Phoenix is updated daily).

Assuming, of course, that the data are sufficiently similar that they can be spliced, possibly with some adjustments. The major distinction between the data sets is likely to be the sources, with ICEWS using Open Source Center feeds and proprietary data services, and Phoenix a white-list of Web-based sources. This is likely to make a big difference in some areas—in the very limited exploration I’ve done, ICEWS seems to disproportionately focus on India, for example, and for statutory reasons, contains no internal data on the US—and less in others. Actor dictionaries will not be an issue as the ICEWS dictionaries could be used to code the Phoenix sources, though this may not be necessary.

The different coding engines may or may not make a difference: in the absence of a confirmable set of gold standard cases for events, and verb dictionaries, we will need a significant period when the two sets overlap to find out whether the two systems perform significantly differently. My guess is that they won’t differ all that much, particularly if common actor dictionaries are used, since that both coders are based on full parsing, and the differing sources will be the bigger issue. Both Phoenix and ICEWS provide information on the publications where the coded text came from, so these could be filtered to get similar source sets.

In the absence of a public version of ICEWS overlapping with the [still relatively brief] Phoenix data, we can only do indirect measures of the likely similarities, but some quick analyses I’ve done comparing marginals of the first six months of Phoenix with the last six months of ICEWS indicate two promising points of convergence: the density of data (events per day) was quite similar and—even more telling—the marginal densities of the event types were very similar (actors less so but again, that’s easily corrected since the ICEWS actor dictionaries are public).  Again, we won’t be able to do the more crucial test—the correlation of dyad-level event counts—until there is a substantial overlap in the public data, but initial indications are promising.

What needs to be done (all open)

Call me a greedy anti-intellectual knuckle-dragging Neanderthal—and you will—but when I read a recent article in Science about the construction of an esoteric scientific instrument whose construction cost was $300-million and annual operating costs are $30-million, and then compared that with the pittance that is being allocated—when we can avoid our programs being shut down altogether [9]—for event data which could contribute significantly to at the very least to increasing the ability of NGOs to accurately anticipate situations where “bad things might happen” [10], or even to a reality-based foreign policy, I get a tad irritated. Consider these aspects of the instrument in question:

  • it may not work—its also-costly predecessor did not—and half of the project is situated in a place in Louisiana that makes it less likely to work, suggesting it is largely a mindless boondoogle. A boondoogle located in Louisiana, I’m shocked, shocked.
  • if it does work, it merely further confirms a century-old theory which we’ve got complete confidence in already, and as the Science article points out, is confirmed billions of times each day, for example as a smart phone displays inappropriate content having determined that you are a male walking within fifty meters of a Victoria’s Secret outlet store. [14]
  • and the predictions of the theory at issue were already confirmed by other observational evidence four decades ago, for which the discoverers got a nice trip to Stockholm.

Which is to say, this is just the natural sciences equivalent of a performance art project [13], but at a rather higher price tag. And unlike space telescopes and Mars rovers, we don’t even get nice pictures from it.

So, like, if we can spend what will probably eventually total some half-billion dollars before this thing winds down, presumably with the yawn-inducing equivalent of the umteenth iteration of  “Hey, ya’know, Mars once had water on it!!” how about spending 1%—just a lousy 1%—of the that amount (which is probably also about 10% of the cost of ICEWS) on enhancing event data? And this time with social scientists in charge, not folks whose prime competence is raiding the public purse under the guise of protecting our national interests against opponents who disappeared decades ago. Oh, and every single line and file of the project open source. I’m just asking for 1%!  A guy can dream, right?

So, say we’ve got $5-million. Here’s my list

1. Open gold standard cases. Do it right: the baseline will be the openly available Linguistic Data Consortium GigaWord news files, use a realistically large set of coders with documented training protocols and inter-coder performance evaluation, do accuracy assessments, not just precision assessments. Sustained human coder performance is typically about 6 events per hour—probably faster on true negatives—and we will need at least 10,000 gold standard cases, double-coded, which comes to a nice even $50K for coders at $15/hour, double this amount for management, training and indirects, and we’re still at only $100K.

2. Solve—or at least improve upon—the open source geocoding issue. This is going to be the most expensive piece, and could easily absorb half the funds available. But the payoffs would be huge and apply in a wide number of domains, not just event data. I’d put $2M into this.

3. Extend CAMEO and standard sub-state actor codes, using open collaboration among assorted stakeholders with input from various coding groups working in related domains. We know, for example, that one of the main things missing in CAMEO are routine democratic processes such as elections, parliamentary coalition formation, and legislative debate, and there are people who know how to do this better than us bombs-and-bullets types. On sub-state actor coding, religious and ethnic groups are particularly important. I’m guessing one could usefully spend $250K here. Also call it something other than CAMEO.

4. Automated verb phrase recognition and extraction, which will be needed for extending the CAMEO successor ontology. I actually think we’re pretty close to solving this already, and we could get some really good software for $50K. [11]  If that software works as well as I hope it will, then spend another $250K getting verb-phrase dictionaries for the new comprehensive system.

5. Event-specific coding modules, for example for coding protests and electoral demonstrations. Open-ended, but one could get a couple templates for $100K.

6. Systematic assessment of the native language versus machine translation issue. That is, do we need coding systems (coders and dictionaries) specific to languages other than English, particularly French, Spanish, Arabic and Chinese [12], or is machine-translation now sufficient—remember, we’re just coding events, not analyzing poetry or political manifestos—so given finite resources, we would be better off continuing the software development in English (perhaps with source-language-specific enhancement for the quirks of machine translation). Hard to price this one but it is really important so I’d allocate $500K to it

7. Insert your favorite additional requirements here: we’ve still got $1.75M remaining in our budget, which also allows a fair amount of slack for excessively optimistic estimates on the other parts of the project. Or if no one has better ideas, next on my list would be systematically exploring splicing and other multiple-data-set methods such as multiple systems estimation. And persuade Lockheed to dust off the unjustly maligned JABARI—or make the code open source if they have no further use for it—and give us another alternative sequence based on that program.

All this for only 1% of the cost of a single natural science performance art project! Come on, someone out there with access to the public trough—or even some New Gilded Age gadzillionaire—let’s go for it! Pretty please?


1. Yeah, I can just imagine the conversations at ISA in New Orleans (I was on Maui. Just on vacation. Really.)

“Hey, Schrodt really disappeared once he left Penn State. Figured that would happen…”

“Really, it’s bad: I heard that he was last seen on the side of the exit ramp off I-99 to Tyrone, looking really gaunt and holding a cardboard sign that said ‘Will analyze mass atrocities for food’.”

“Yes, that’s right: so sad. So keep that in mind if you are thinking about leaving academia, or even imaging the possibility of asking any senior faculty to get their fat Boomer butts out of the way.”

Well, no, that’s not really accurate. But we’ll save that for another blog entry. Meanwhile, you can follow me on GitHub. And I’ll be at EPSA in Vienna.

2. And keep our faith in the wheel of karma.

3. I’m not exactly sure who “we” is—I’m neither royalty nor, to my knowledge, have a tapeworm—but I’m trying to represent the views of a loose amalgam of people who have been working with machine-coded event data for a good quarter-century now.

4. Total transparency when the coding software is available, which is not the case here, but even without the software these dictionaries are a huge improvement over the transparency in most human coding projects, where too many decisions rest on an undocumented and ever-shifting lore known only to the coders.

5. Or whatever it should be called: it will always be KEDS—Kansas Event Data System—to me.

6. Two Protestant denominations get designations at the same level as “Herdswoman” and “Pirate Party”—Episcopal (but not Anglican) and Methodist—and there is an entry for “Maronite.” That’s it: no Lutherans, no Baptists, no Pentecostals, no Mormons, not even the ever-afflicted Jehovah’s Witnesses. In fact in the ICEWS agent ontology, the only religions worthy of subcategories are Christian, Catholic, Buddhist, Hindu and Moslem, though the latter has not been affected by that unpleasantness at Karbala in 680 CE.  The ontology developers, however, appear to have spent a bit too much time watching re-runs of the Kung-Fu television series—or more ambiguously, Batman Begins—as only Buddhism produces “warriors.”

7. To say nothing of the late Fred Phelps.

8. Yeah, yeah, they moved to a condo in Arizona two decades ago and the old place was torn down and replaced with a MacMansion, but it still makes for a nice metaphor.

9. Though I did notice that Senator Jeff Flake was one of the few Republicans not to throw his lot in with the GOP efforts to provide free policy advice to the Islamic Republic of Iran, so perhaps his M.A. in Political Science did some good.

10.I think we are now at a point where these things can make a serious difference: The absence of major electoral violence in the 2013 Kenyan elections and—fingers crossed—the 2015 Burundi elections may eventually be seen as breakthroughs on this issue.

11. But meanwhile, don’t get me started on the vast amounts that is wasted on hiring programmers who never finish the job. Really, people, the $75 to $150 an hour to hire someone with a professional track record who will actually write the programs you need is a better deal than spending $25,000+ a semester—stipend, tuition and indirects, and this is actually a low estimate for many private institutions—for one or more GRAs are supposed to be learning programming but who, in fact, stand a pretty good chance of getting absolutely nowhere because writing sophisticated research software does not, in many instance, provide a good pedagogical platform. No matter what your Office for the Suppression of Research says.

12. Hindu/Urdu is also important in terms of the number of speakers, but, for better or worse, the media elites in the region use English extensively.

13. If you aren’t familiar with this concept, Google “bad performance art.” NSFW.

14. To clarify, not the precise example used by Science. Continue reading

Posted in Methodology | 5 Comments