Witnessing a paradigm shift?

The philosopher of science Thomas Kuhn is famous—beyond an apparent penchant for throwing ashtrays [1]—for his vastly over-generalized concept of “paradigm shifts” in scientific understanding, where a set of ideas once thought unreasonable becomes the norm, exchanging this status with ideas on the same topic once almost universally accepted. [2] This typically involves a generational change—Max Planck famously observed that scientific progress occurs one funeral at a time —but can sometimes occur more quickly. And I think I’m watching one develop in the field of predictive models of conflict behavior.

The context here [3] was a recent workshop I attended in Europe on that topic. The details don’t matter but suffice it to say this involved an even mix of the usual suspects in quantitative conflict data and modeling—I’m realizing there are perhaps fifty of us in the world—and an assortment of NGOs and IGOs, mostly consumers of the information. [4] Held amid the monumental-brutalist architecture housing the pan-European bureaucracy, presumably the model for the imperial capital in The Hunger Games, leading one to sympathize, at least to a degree, with European populist movements. And by the way, in two days of discussions no one mentioned Donald Orange-mop even once: we’re past that.

The promised paradigm change is on the issue of whether technical models for forecasting conflict are even possible—and as I’ve argued vociferously in the past, academic political science completely missed the boat on this—and it looks as though we’ve suddenly gone from “that’s impossible!” to “okay, where’s the model, and how can we improve it?” This new assessment being entirely due to the popularization over the past year of machine learning. The change, even taking into account that the Political Instability Task Force has been doing just this sort of thing, and doing it well, for at least fifteen years, has been stunningly rapid.

Not, of course, without more than a few bumps along the way. Per the persistent hype around “deep learning,” there’s a strong assumption that “artificial intelligence” is now best done with neural networks—and the more complex the better—whereas there’s consistent evidence both from this workshop and a number of earlier efforts I’m familiar with that because of the heterogeneity of the cases and the tiny number of positives, random forests are substantially better. There’s also an assumption that you can’t figure out which variables are important in a machine learning model: again, wrong, as this is routine in random forests and can be done to a degree even in neural nets, though it’s rather computationally intensive. One presenter—who had clearly consumed a bit too much of the Tensorflow Kool-Aid—noted these systems “learn on their own”: alas, that’s not true for this problem [6] and in fact we need lots of training cases, and in conflict forecasting models the aforementioned heterogeneity and rare positives still hugely complicate estimation.

So these models are not easy, but they are now considered possible, and there is an actual emerging paradigm: In the course of an hour I saw presentations by a PhD student in a joint program at Universities of Stockholm and Iceland developing a resource-focused conflict forecasting model and a data scientist from the World Bank and FAO working on famine forecasting [7] both implementing essentially the same very complex protocols for training, calibration, and cross-validation of various machine learning models. [8][15]

Well, we live in interesting times.

There’s a fairly standard rule-of-thumb in economic history stating it takes between one and two human generations—20 to 40 years—to effectively incorporate a major new technology into the production structure of organizations. The—yes, paradigmatic—cases are the steam engine, electricity, and computers. [9] I’ve sensed for quite some time that we’re in this situation, perhaps half-way through the process, with respect to technical forecasting models and foreign policy decision-making. [10] As Tetlock and others have copiously demonstrated, the accuracy of human assessments in this field is very low, and as Kahneman and others have copiously demonstrated, decision-making on high-risk, low-probability issues is subject to systematic biases. Until quite recently, however, data [11] and computational constraints meant there were no better alternatives. But there are now, so the issue is how to properly use this information.

And not every new technology takes a generation before it is adopted: to take some examples most readers will be familiar with, word-processing, MP3 music files, flat-screen displays, and cell phones displaced their earlier rivals almost in a historical eye-blink, albeit except for word processing this was largely in a personal rather than organizational context. In the long-ago research phase of ICEWS—a full ten years ago now, wow…—I had a clever slide (well, I thought it was clever) showing a robot saying “We bomb Mindanao in six hours” and a medal-bedecked general responding “Yes, master” to illustrate what technical forecasting models are not designed to do. But with accuracy 20% to 30% better than human forecasts, one would think these approaches should have some impact on the process. That is going to take time and effort to figure that out, particularly since human egos and status are involved, and the models will make mistakes. And present a new set of challenges, just as electrical power presents a different sets of risks and opportunities than the steam and water power it replaced. But their eventual incorporation into policy-making seems inevitable.

Finally, this might have implications for the future demand for event data, as models customized for very specific organizational needs finally provide a “killer app” using event data as a critical input. As it happens, no one has yet to come up with something that does the job of event data—recording day to day interactions of political actors as reported in the open press—without simply looking pretty much like plain old event data: Both the CAMEO and PLOVER [12] event coding systems still have the basic structure of the 60-year-old WEIS, because WEIS corporates most things in the news of interest to analysts (and their quantitative models). While the forecasting models I’m currently seeing primarily use annual (and state-level) structural data, as soon as one drops to the sub-annual level (and, increasingly, sub-state, as geocoding of event data improves) event data are really the only game in town. [13]

Footnotes

1. Recently back in the news…well, sort of…thanks to a thoroughly unflattering book by documentary film-maker Errol Morris, whose encounters with Kuhn when Morris was a graduate student left a traumatic impression of Kuhn being a horse’s ass of truly mythic proportions, though some have suggested parts of the book may themselves border on mythic…nonetheless, be civil to your grad students lest they become award winning film makers and/or MacArthur Award recipients long after you and any of your friends are around to defend your reputation. Well, and perhaps because being nice to your grad students is simply the right thing to do.

2. And thus the hitherto obscure word “paradigm” entered popular parlance: a number of years ago, at the height of the dot-com bubble, social philosopher David Barry proposed simply making up a company name, posting this on the web, and seeing how much money would pour in. The name he proposed was “Gerbildigm”, combining “gerbil” and “paradigm.” Mind you, that’s scarcely different than what actual companies were doing in the late 1990s to generate funding. Nowadays, in contrast, they simply say they are exploring applications of deep learning.

3. And by the way, this isn’t the snark-fest promised in the previous blog entry; that’s still to come, though events are so completely depressing at the moment—okay, “Christian” conservatives, you won the right not to bake damn wedding cakes, but at the price of suborning tearing infants out of the arms of their mothers: you really think that tradeoff is a good deal? Will your god? You’ve got an exemption from Matthew 25:35-40 now, eh? You’re completely confident about this? You sure?—I’m having difficulty gearing up for a snark-fest even though it is half-written. Though stuff I have half-written would fill a not-inconsequentially sized bookshelf.

4. It is also notable that the gender ratio at this very technical workshop was basically 50/50, and this included the individuals developing the data and models, not just the consumers. In the U.S., that ratio would have been 80/20 or even 90/10. So by chance is the USA excluding some very talented potential contributors to this field? [5] And is this related to the work of Jayhawk economist Donna Ginther, highlighted on multiple occasions by The Economist over the past few months, that in the academic discipline of economics, gender discrimination appears to be considered a feature rather than a bug? Which cascaded over into the academic field of political methodology, though thanks to the efforts of people like Janet Box-Steffensmeier, Sara Mitchell, Caroline Tolbert, and institutions like VIM is not as bad as it once was. But compared to my experiences in Europe, could still improve.

5. I recently stumbled onto historian Marie Hicks’s study titled Programmed Inequality: How Britain Discarded Women Technologists and Lost Its Edge in Computing. Brogrammers take note: gender discrimination doesn’t necessarily have a happy ending.

6. Self-learning is, famously, possible for games like poker, chess and go, which have the further advantage that the average person can understand the application, thus providing ample fodder for breathless headlines, further leading to fears that our new Go-and-Texas-Hold’em neural network overlords will, like Daleks and Cylons, shortly lethally threaten us, even if they still can’t manage to control machines sufficiently well to align the doors to shut properly on a certain not-so-mass-produced electric vehicle produced by a company owned by one of the more notable alarmists concerned about the dangers of machine intelligence. Plus there’s the little issue of control of the power cord. I digress.

7. Amusingly, for the World Bank work, the analyst then has to run comparable regression models because that’s apparently the only thing the economists there understand. At the moment.

8. Nor was this the standard protocol for producing a regression model which, gentle reader, I would remind you has the following steps (as Adam Smith pointed out in 1776, for maximal efficiency, assemble a large team of co-authors with specialists doing each task!):

Develop some novel but vaguely plausible “theory”
Assemble a set of 25 or so variables from easily available data sets
Run transformations and subsets of these, ideally using automated scripts to save thought and labor, until one or more combinations emerge where the p-values on your pet variables are ≤0.05. Justify any superfluous variables required to achieve this via collinearity—say, parakeets-per-capita—as “controls.” Bonus points for using some new variant of regression for which the data do not remotely satisfy the assumptions and which mangles the coefficients beyond any hope of credible interpretation. Avoid, at all costs, out-of-sample assessments of any form.
Report this in a standardized social science format 35 ± 5 pages in length (with a 100-page web appendix) starting with an update of the literature review from your dissertation[s], copiously citing your friends and any likely reviewers, and interpreting the coefficients as though they were generated using OLS estimation. Make sure the “Discussion” and “Conclusions” sections essentially duplicate each other and the statistical tables.
Publish in a proprietary journal which will appear in print after a lag of at least three years, firewalled and thus inaccessible to the policy community, but no one will ever look at it anyway. Though previously you will have presented the problem, methodology, and results in approximately 500 seconds (you’re on a five paper panel, of course) at a major conference where your key slide will show 4 variants of the final 16-variable model with the coefficients to 6 decimal places and several p-values reported as “0.000.” The five people in the audience would be unable to read the resulting 3-point type except they are browsing the conference program instead of listening; the discussant asks why you didn’t include four addition controls.
PROFIT!

I jest. I wish.

9. In fact quite a few people have suggested that computers still aren’t being used to their full capacity in corporations because they would render many middle managers irrelevant, and these individuals, unlike Yorkshire handloom weavers, are in a position to resist their own displacement: The Economist had a nice essay to this effect a couple weeks ago.

10. The concept of a systematic foreign policy is, of course, at present quaintly anachronistic in the U.S., where foreign policy, such as it is, is made on the basis of wild whims and fantasies gleaned from a steady if highly selective diet of cable TV, combined with a severe case of dictator-envy and the at least arguable proposition that poutine constitutes a threat to national security. But ever the optimist I can imagine the U.S. returning to a more civilized approach somewhere in the future, just as Rome recovered from both Nero and Caligula. Also as noted, this workshop was in Europe, which has suddenly been incentivized to get serious about foreign policy.

11. This is an important caveat: the data are every bit as important as the methods, and for many remote geographical areas under high conflict risk, we probably still don’t have all the data we need, even though we have a lot more than we once did. But data is hard, and data can be very boring—certainly it’s not going to attract the headlines that a glitzy new game-playing or kitten-identifying machine learning application can attract, and at the moment this field is dependent on a large number of generally underfunded small projects, the long-term Scandinavian commitments to PRIO and the Uppsala UCDP being exceptions. In the U.S., the continued funding of the ICEWS event data is very tenuous and the NSF RIDIR event data funding runs out in February-2018…just saying…

12. Speaking of PLOVER, at yet another little workshop, I was asked about the painfully slow progress towards implementing PLOVER, and it occurred to me that it’s currently trying to cross a technological “valley of death” [14] where PLOVER, properly implemented, would be clearly superior to CAMEO, but CAMEO already exists, and there is abundant CAMEO data (and software for coding it) available for free, and existing models already do a reasonably good job of accommodating the problems of CAMEO. “Free and already available” is a serious advantage if your fundamental interest is the model, not the data: This is precisely why WEIS, despite being proposed as a first-approximation to what would certainly be far better approaches, was used for about 25 years and CAMEO, which wasn’t even intended as a general-purpose coding scheme, is heading towards the two-decade mark, despite well-known issues with both.

13. Though the other thing to watch here is the emerging availability of low-cost, and frequently updated, remote sensing data. The annualized NASA night-light data is already being used increasingly to provide sub-state information with high geographical precision, and new private sector data, as well as new versions of night-lights, are likely to be available at a far greater frequency.

14. Googling this phrase to get a clean citation, I see it has been used to mean about twenty different things, but the one I’m employing here is a common variant.

15. And while I’m on the topic of unsolicited advice to grad students, yet another vital professional skills they don’t teach you in graduate school is flying to Europe and being completely alert the day after you arrive. My formula:

Sleep as much as you can on the overnight flight (sleeping on planes, ideally without alcohol, is another general skill)
Take at most a one hour nap before sunset, and spend most of the rest of the time outside walking;
Live on the East Coast
Don’t change planes (or at least terminals) at Heathrow