Seven current challenges in event data

Click to download PDFThis is the promised follow-up to last week’s opus, “Stuff I Tell People About Event Data, herein referenced as SITPAED. It is motivated by four concerns:

  • As I have noted on multiple occasions, the odd thing about event data is that it never really takes off, but neither does it ever really go away
  • As noted in SITPAED, we presently seem to be languishing with a couple “good enough” approaches—ICEWS on the data side and PETRARCH-2 on the open-source coder side—and not pushing forward, nor is there any apparent interest in doing so
  • To further refine the temporal and spatial coverage of instability forecasting models (IFMs)—where there are substantial current developments—we need to deal with near-real-time news input. This may not look exactly like event data, but it is hard to imagine it won’t look fairly similar, and confront most of the same issues of near-real-time automation, duplicate resolution, source quality and so forth
  • Major technological changes have occurred in recent years but, at least in the open source domain, coding software lags well behind these, and as far as I know, coder development has stopped even in the proprietary domain

I will grant that in current US political circumstances—things are much more positive in Europe—”good enough” may be the best we can hope for, but just as the “IFM winter” of the 2000s saw the maturation of projects which would fuel the current proliferation of IFMs, perhaps this is the point to redouble efforts precisely because so little is going on.

Hey, a guy can dream.

Two years ago I provided something of a road-map for next steps in terms of some open conjectures and additional reflections can be found here and here. This essay is going to be more directed, with an explicit research agenda, along the lines of the proposal for a $5M research program at the conclusion of this entry from four years ago. [1] These involve quite a variety of levels of effort—some could be done as part of a dissertation, or even an ambitious M.A. thesis, others would require a team with substantial funding—but I think all are quite practical. I’ll start with seven in detail, then briefly discuss seven more.

1. Produce a fully-functional, well-tested, open-source coder based on universal dependency parsing

As I noted in SITPAED, PETRARCH-2 (PETR-2)—the most recent open source coder in active use, deployed recently to produce three major data sets—was in fact only intended as a prototype. As I also noted in SITPAED, universal dependency parsing provides most of the information required for event data coding in an easily processed form, and as a bonus is by design multi-lingual, so for example, in the proof-of-concept mudflat coder, Python code sufficient for most of the functionality required for event coding is about 10% the length of comparable earlier code processing a constituency parse or just doing an internal sparse parse. So, one would think, we’ve got a nice opportunity here, eh?

Yes, one would think, and for a while it appeared this would be provided by the open-source “UniversalPetrarch” (UP) coder developed over the past four years under NSF funding. Alas, it now looks like UP won’t go beyond the prototype/proof-of-concept stage due to an assortment of  “made sense at the time”—and frankly, quite a few “what the hell were they thinking???”—decisions, and, critically, severe understaffing. [2] With funding exhausted, the project winding down, and UP’s sole beleaguered programmer mercifully reassigned to less Sisyphean tasks, the project has 31 open—that is, unresolved—issues on GitHub, nine of these designated “critical.”

UP works for a couple of proofs-of-concepts—the coder as debugged in English will, with appropriate if very finely tuned dictionaries, also code in Arabic, no small feat—but as far as I am following the code, the program essentially extracts from the dependency parse the information found in a constituency parse, this approach consistent with UP using older PETR-1 and PETR-2 dictionaries and being based on the PETR-2 source code. It sort of works, and is of course the classical Pólya method of converting a new problem to something you’ve already solved, [8] but seems to be going backwards. Furthermore PETR-1/-2 constituency-parse-based dictionaries [10] are all that UP has to work with: no dictionaries based on dependency parses were developed in the project. Because obviously the problem of writing a new event coder was going to be trivial to solve.

Thus putting us essentially back to square one, except that NSF presumably now feels under no obligation to pour additional money down what appears to be a hopeless rathole. [11] So it’s more like square zero.

Well, there’s an opportunity here, eh? And soon: there is no guarantee either the ICEWS or UT/D-Phoenix near-real-time data sets will continue!!

2. Learn dictionaries and/or classifiers from the millions of existing, if crappy, text-event pairs

But the solution to that opportunity might look completely different from any existing coder, being based on machine-learning classifiers—for example some sort of largely unsupervised indicator extraction based on the texts alone, without an intervening ontology (I’ve seen several experiments along these lines, as well as doing a couple myself)—rather than dictionaries. Or maybe it will still be based on dictionaries. Or maybe it will be a hybrid, for example doing actor assignment from dictionaries—there are an assortment of large open-access actor dictionaries available, both from the PETRARCH coders and ICEWS, and these should be relatively easy to update—and event assignment (or, for PLOVER, event, mode, and context assignment) from classifiers. Let a thousand—actually, I’d be happy with one or ideally at least two—flowers bloom.

But unless someone has a lot of time [12]—no…—or a whole lot of money—also no…—this new approach will require largely automated extraction of phrases or training cases from existing data: the old style of human development won’t scale to contemporary requirements.

On the very positive side, compared to when these efforts started three decades ago, we now have millions of coded cases, particularly for projects such as TERRIER and Cline-Phoenix (or for anyone with access to the LDC Gigaword corpus and the various open-source coding programs) which have both the source texts and corresponding events. [13]  Existing coding, however, is very noisy—if it wasn’t, there would be no need for a new coder—so the challenge is extracting meaningful information (dictionaries, training cases, or both) for a new system, either in a fully-automated or largely automated fashion. I don’t have any suggestions for how to do this—or I would have done it already—but I think the problem is sufficiently well defined as to be solvable.

3. ABC: Anything but CAMEO

As I pointed out in detail in SITPAED, and which is further elaborated in the PLOVER manual and various earlier entries in this blog, despite being used by all current event data sets, CAMEO was never intended as a general-purpose event ontology! I have a bias towards replacing it with PLOVER—presumably with some additional refinements—and in particular I think PLOVER’s proposed event-mode-context format is a huge improvement, both from a coding, interpretation, and analytical perspective, over the hierarchical format embedded in earlier schemes, starting with WEIS but maintained, for example, in BCOW as well as CAMEO.

But, alas, zero progress on this, despite the great deal of enthusiasm following the original meeting at NSF where we brought together people from a number of academic and government research projects. Recent initiatives on automated coding have, if anything, gone further away, focusing exclusively on coding limited sets of dependent variables, notably protests. Just getting the dependent variable is not enough: you need the precursors.

Note, by the way, that precursors do not need to be triggers: they can be short-term structural changes that can only be detected via event data because they are unavailable in the tradition structural indicators reported only on an annual basis and/or national level. For at least some IFMs, it has been demonstrated that at the nation-year level, event measures can be substituted for structural measures and provide roughly the same level of forecasting accuracy (sometimes a bit more, sometimes a bit less, always more or less in the ballpark). While this has meant there is little gained from adding events to models with nation-year resolution, at the monthly and sub-state geographical levels, events (or something very similar to events) are almost certainly going to be the only indicators available.

4. Native coders vs machine translation

At various points in the past couple of years, I’ve conjectured that the likelihood that native-language event coders—a very small niche application—would progress more rapidly than machine translation (MT)—an extremely large and potentially very lucrative application—is pretty close to zero. But that is only a conjecture, and both fields are changing rapidly. Multi-language capability is certainly possible with universal dependency parsing—that is much of the point of the approach—and in combination with largely automated dictionary development (or, skipping the dictionaries all together, classifiers), it is possible that specialized programs would be better than simply coding translated text, particularly for highly-resourced languages like Spanish, Portuguese, French, Arabic, and Chinese, and possibly in specialized niches such as protests, terrorism, and/or drug-related violence.

Again, I’m much more pessimistic about the future of language-specific event coders than I was five years ago, before the dramatic advances in the quality of MT using deep-learning methods, but this is an empirical question. [14]

5. Assessing the marginal contribution of additional news sources

As I noted in SITPAED, over the course of the past 50 years, event data coding has gone from depending on a small number of news sources—not uncommonly, a single source such as the New York Times or Reuters [15]—to using hundreds or even thousands of sources, this transition occurring during the period from roughly 2005 to 2015 when essentially every news source on the planet established a readily-scraped web presence, often at least partially in English and if not, accessible, at least to those with sufficient resources, using MT. Implicit to this model, as with so many things in data science, was the assumption that “bigger is better.”

There are, however, two serious problems to this. The first—always present—was the possibility that all of the event signal relevant to the common applications of event data—currently mostly IFMs and related academic research—is already captured by a few—I’m guessing the number is about a dozen—major news sources, specifically the half-dozen or so major international sources (Reuters, Agence France Presse, BBC Monitoring, Associated Press and probably Xinhua) and another small number of regional sources or aggregators (for example, All Africa). The rest is, at best, redundant because anything useful will have been picked up by the international sources. [16] and/or noise. Unfortunately, as processing pipelines become more computationally intensive (notably with external rather than internal parsing, and with geolocation) those additional sources consume a huge amount of resources, in some cases to supercomputer levels, and limit the possible sponsors of near-real-time data.

That’s the best scenario: the worst is that with the “inversion”—more information on the web is fake than real—these other sources, unless constantly and carefully vetted, are introducing systematic noise and bias.

Fortunately it would be very easy to study this with ICEWS (which includes the news source for each coded event, though not the URL) by taking a few existing applications—ideally, something where replication code is already available—and seeing how much the results change by eliminating various news sources (starting with the extremely long tail of sources which generate coded events very infrequently). It is also possible that there are some information-theoretic measures that could do this in the abstract, independent of any specific application. Okay, it’s not that it might be possible, there are definitely measures available, but I’ve no idea whether they will produce results meaningful in the context of common applications of event data.

6. Analyze the TERRIER and Cline Center long time series

The University of Oklahoma and University of Illinois/Urbana Champaign have both recently released historical data sets—TERRIER and yet-another-data-set-called Phoenix [17] respectively—which vary significantly from ICEWS: TERRIER is “only” about 50% longer (to 1980) but [legally] includes every news source available on LexisNexis, and the single-sourced Cline Center sets are much longer, back to 1945.

As I noted in SITPAED, the downsides of both are they were coded using the largely untested PETR-2 coder and with ca. 2011 actor dictionaries, which themselves are largely based on ca. 2005 TABARI dictionaries, so both recent and historical actors will be missing. That said, as I also showed in SITPAED, at higher levels of aggregation the overall picture provided by PETR-2 may not differ much from other coders (but it might: another open but readily researched question), and because lede sentences almost always refer to actors in the context of their nation-states, simply using dictionaries with nation-states may be sufficient. [18] But most importantly, these are both very rich new sources for event data that are far more extensive than anything available to date, and need to be studied.

7. Find an open, non-trivial true prediction

This one is not suitable for dissertation research.

For decades—and most recently, well, about two months ago—whenever I talked with the media (back in the days when we had things like local newspapers) about event data and forecasting, they would inevitably—and quite reasonably—ask “Can you give us an example of a forecast?” And I would mumble something about rare events, and think “Yeah, like you want me to tell you the Islamic Republic has like six months to go, max!” and then more recently, with respect to PITF, do a variant on “I could tell you but then I’d have to kill you.” [19]

For reasons I outlined in considerable detail here, this absence of unambiguous contemporary success stories is not going to change, probably ever, with respect to forecasts by governments and IGOs, even as these become more frequent, and since these same groups probably don’t want to tip their hand as to the capabilities of the models they are using, we will probably only get the retrospective assessments by accident (which will, in fact, occur, particularly as these models proliferate [20]) and—decades from now—when material is declassified.

Leaving the task of providing accessible examples of the utility of CRMs instead to academics (and maybe some specialized NGOs) though for reasons discussed earlier, doing so obscurely would not bother me. Actually, we need two things: retrospective assessments using the likes of ICEWS, TERRIER, and Cline-Phoenix on what could have been predicted (no over-fitting the models, please…) based on data available at the time, and then at some point, a documentable—hey, use a blockchain!—true prediction of something important and unexpected. Two or three of these, and we can take everything back undercover.

The many downsides to this task involve the combination of rare events, with the unexpected cases being even rarer [21], and long time horizons, these typically being two years at the moment. So if I had a model which, say—and I’m completely making this up!—predicted a civil war in Ghana [22] during a twelve month period after two years, a minimum of 24 months, and a maximum of 36 months, will pass before that prediction can be assessed. Even then we are still looking at probabilities: a country may be at a high relative risk, for example in the top quintile, but still have a probability of experiencing instability well below 100%. And 36 months from now we’ll probably have newer, groovier models so the old forecast still won’t demonstrate state of the art methods.

All of those caveats notwithstanding, things will get easier as one moves to shorter time frames and sub-national geographical regions: for example Nigeria has at least three more or less independent loci of conflict: Boko Haram in the northeast, escalating (and possibly climate-change-induced) farmer-herder violence in the middle of the country, and somewhat organized violence which may or may not be political in the oil-rich areas in the Delta, as well as potential Christian-Muslim, and/or Sunni-Shia religiously-motivated violence in several areas, and at least a couple of still-simmering independence movements. So going to the sub-state level both increases the population of non-obvious rare events, and of course going to a shorter time horizon decreases the time it will take to assess this. Consequently a prospective—and completely open—system such as ViEWS, which is doing monthly forecasts for instability in Africa at a 36-month horizon with a geographical resolution of 0.5 x 0.5 decimal degrees (PRIO-GRID; roughly 50 x 50 km) is likely to provide these sorts of forecasts in the relatively near future, though getting a longer time frame retrospective assessment would still be useful. 

A few other things that might go into this list

  • Trigger models: As I noted in my discussion of IFMs , I’m very skeptical about trigger models (particularly in the post-inversion news environment), having spent considerable time over three decades trying to find them in various data sets, but I don’t regard the issue as closed.
  • Optimal geolocation: MORDECAI seems to be the best open-source program out there at the moment (ICEWS does geolocation but the code is proprietary and, shall we say, seems a bit flakey), but it turns out this is a really hard problem and probably also isn’t well defined: not every event has a meaningful location.
  • More inter-coder and inter-dataset comparison: as noted in SITPAED, I believe the Cline Center has a research project underway on this, but more would be useful, particularly since there are almost endless different metrics for doing the comparison.
  • How important are dictionaries containing individual actors?: The massive dictionaries available from ICEWS contain large compendia of individual actors, but how much is actually gained by this, particularly if one could develop robust cross-sentence co-referencing? E.g. if “British Prime Minister Theresa May” is mentioned in the first sentence, a reference to “May” in the fourth sentence—assuming the parser has managed to correctly resolve “May” to a proper noun rather than a modal verb or a date—will also resolve to “GBRGOV”.
  • Lede vs full-story coding: the current norm is coding the first four or six sentences of articles, but to my knowledge no one has systematically explored the implications of this. Same for whether or not direct quotations should be coded.
  • Gold standard records: also on the older list. These are fabulously expensive, unfortunately, though a suitably designed protocol using the “radically efficient” prodigy approach might make this practical. By definition this is not a one-person project.
  • A couple more near-real-time data generation projects: As noted in SITPAED, I’ve consistently under-estimated the attention these need to guarantee 24/7/365 coverage, but as we transition from maintaining servers in isolated rooms cooled to meat-locker temperatures and with fans so noisy as to risk damage to the hearing of their operators except server operators tend to frequent heavy metal concerts…I digress…to cloud-based servers based in Oregon and Northern Virginia, this should get easier, and not terribly expensive.

Finally, if you do any of these, please quickly provide the research in an open access venue rather than providing it five years from now somewhere paywalled.

Footnotes

1. You will be shocked, shocked to learn that these suggestions have gone absolutely nowhere in terms of funding, though some erratic progress has been made, e.g. on at least outlining a CAMEO alternative. One of the suggestions—comparison of native-language vs MT approaches—even remains on this list.

2. Severely understaffed because the entire project was predicated on the supposition that political scientists—as well as the professional programming team at BBN/Raytheon who had devoted years to writing and calibrating an event coder—were just too frigging stupid to realize the event coding problem had already been solved by academic computer scientists and a fully functioning system could be knocked out in a couple months or so by a single student working half time. Two months turned into two years turned into three years—still no additional resources added—and eventually the clock just ran out. Maybe next time.

I’ve got a 3,000-word screed written on the misalignment of the interests of academic computer scientists and, well, the entire remainder of the universe, but the single most important take-away is to never, ever, ever forget that no computer scientist ever gained an iota of professional merit writing software for social scientists. Computer scientists gain merit by having teams of inexperienced graduate students [3]—fodder for the insatiable global demand by technology companies, where, just as with law schools, some will eventually learn to write software on the job, not in school [4]—randomly permute the hyper-parameters of long-studied algorithms until they can change the third decimal point of a standardized metric or two in some pointless—irises, anyone?—but standardized data set, with these results published immediately in some ephemeral conference proceeding. That’s what academic computer scientists do: they don’t exist to write software for you. Nor have they the slightest interest in your messy real-world data. Nor in co-authoring an article which will appear in a paywalled venue after four years and three revise-and-resubmits thanks to Reviewer #2. [6] Never, ever, ever forget this fact: if you want software written, train your own students—some, at least in political methodology programs, will be surprisingly good at the task [7]—or hire professionals (remotely) on short-term contracts.

Again, I have written 3,000 words on this topic but, for now, will consign it to the category of “therapy.”

3. These rants do not apply to the tiny number of elite programs—clearly MIT, Stanford, and Carnegie Mellon, plus a few more like USC, Cornell and, I’ve been pleased to discover, Virginia Tech, which are less conspicuous—which consistently attract students who are capable of learning, and at times even developing, advanced new methods and at those institutions may be able to experiment with fancier equipment than they could in the private sector, though this advantage is rapidly fading. Of course, the students at those top programs will have zero interest in working on social science projects: they are totally involved with one or more start-ups.

4. And just as in the profession of law, the incompetent ones presumably are gradually either weeded out, or self-select out: I can imagine no more miserable existence than trying to write code when you have no aptitude for the task, except if you are also surrounded, in a dysfunctional open-plan office setting [5], by people for whom the task is not only very easy, but often fun.

5. The references on this are coming too quickly now: just Google “open plan offices are terrible” to get the latest.

6. I will never forget the reaction of some computer scientists, sharing a shuttle to O’Hare with some political scientists, on learning of the publication delays in social science journals: it felt like we were out of the Paleolithic and trying to explain to some Edo Period swordsmiths that really, honest, we’re the smartest kids on the block, just look at the quality of these stone handaxes!

7. Given the well-documented systemic flaws in the current rigged system for recruiting programming talent—see this and this and this and this and this—your best opportunities are to recruit, train, and retain women, Blacks and Hispanics: just do the math. [8]

8. If you are a libertarian snowflake upset with this suggestion, it’s an exercise in pure self-interest: again, do the math. You should be happy.

9. I was originally going to call this the “Pólya trap” after George Pólya’s How to Solve Itonce required reading in many graduate programs but now largely forgotten—and Pólya does, in fact, suggest several versions of solving problems by converting them to something you already know how to solve, but his repertoire goes far beyond this.

10. They are also radically different: as I noted in SITPAED, in their event coding PETR-1, PETR-2, and UP are almost completely different programs with only their actor dictionaries in common.

11. Mind you, these sorts of disappointing outcomes are hardly unique to event data, or the social sciences—the National Ecological Observatory Network (NEON), a half-billion-dollar NSF-funded facility has spent the last five years careening from one management disaster to another like some out-of-control car on the black ice of Satan’s billiard table. Ironically, the generally unmanaged non-academic open source community—both pure open source and hybrid models—with projects like Linux and the vast ecosystem of Python and R libraries, has far more efficiently generated effective (that is, debugged, documented, and, through StackOverflow, reliably supported) software than the academic community, even with the latter’s extensive public funding.

12. Keep in mind the input to the eventual CAMEO dictionaries was developed at the University of Kansas over a period of more than 15 years, and focused primarily on the well-edited Reuters and later Agence France Presse coverage of just six countries (and a few sub-state actors) in the Middle East, with a couple subsets dealing with the Balkans and West Africa.

13. With a bit more work, one can use scrapping of major news sites and the fact that ICEWS, while not providing URLs, does provide the source of its coded events, and in most cases the article an event was coded from is quite unambiguous by looking at the actors involved (again, actor dictionaries are open and easy to update). Using this method, over time a substantial set of current article-event pairs could be accumulated. Just saying…

14. This, alas, is a very expensive empirical question since it would require a large set of human-curated test cases, ideally with the non-English cases coded by native speakers, to evaluate the two systems, even if one had a credibly-functioning system working in one or more of the non-English languages. Also, of course, even if the language-specific system worked better than MT on one language, that would not necessarily be true on others due to differences on either the event coder, the current state of MT for that language (again, this may differ dramatically between languages), or the types of events common to the region where the language is used (some events are easier to code, and/or the English dictionaries for coding them are better developed, than others). So unless you’ve got a lot of money—and some organizations with access to lots of non-English text and bureaucratic incentives to process these do indeed have a lot of money—I’d stay away from this one.

15. For example for a few years, when we had pretty good funding, the KEDS project at Kansas had its own subscription to Reuters. And when we didn’t, we were ably assisted by some friendly librarians who were generous with passwords.

The COPDAB data set, an earlier, if now largely forgotten, competitor to WEIS, claimed to be multi-source (in those days of coding from paper sources, just a couple dozen newspapers), but its event density relative to the single-sourced WEIS came nowhere close to supporting that contention, and the events themselves never indicated the sources: What probably happened is that multiple sourcing was attempted, but the human coders could not keep up and the approach was abandoned.

16. Keep in mind that precisely because these are international and in many instances, their reporters are anonymous, they have a greater capacity to provide useful information than do local sources which are subject to the whims/threats/media-ownership of local political elites and/or criminals. Usually overlapping sets.

17. Along with “PETRARCH,” let’s abandon that one, eh: I’m pretty good with acronyms—along with self-righteous indignation, it’s my secret superpower!—so just send me a general idea of what you are looking for and I’ll get back to you with a couple of suggestions. Seriously.

Back in the heady days of decolonization, there was some guy who liked to design flags—I think this was just a hobby, and probably a better hobby than writing event coders—who sent some suggestions to various new micro-states and was surprised to learn later that a couple of these flags had been adopted. This is the model I have in mind.

Or do it yourself—Scrabble™-oriented web sites are your best tool!

18. Militarized non-state actors, of course, will be missing and/or misidentified—”Irish Republican Army” might be misclassified as IRLMIL—though these tend to be less important prior to 1990. Managing the period of decolonization covered by the Cline data is also potentially quite problematic: I’ve not looked at the data so I’m not sure how well this has been handled. But it’s a start.

19. PITF, strictly speaking, doesn’t provide much information on how the IFM models have been used for policy purposes but—flip side of the rare events—there have been a few occasions where they’ve seemed be quite appreciative of the insights provided by the IFMs, and it didn’t take a whole lot of creativity to figure out what they must have been appreciative about.

That said, I think this issue of finding a few policy-relevant unexpected events is what has distinguished the generally successful PITF from the largely abandoned ICEWS: PITF (and its direct predecessor, the State Failures Project) had a global scope from the beginning and survived long enough—it’s now been around more than a quarter century—that the utility of its IFMs became evident. ICEWS had only three years (and barely that: this included development and deployment times) under DARPA funding, and focused on only 27 countries in Asia, some of these (China, North Korea) with difficult news environments and some (Fiji, Solomon Islands) of limited strategic interest. So compared to PITF, the simple likelihood that an unexpected but policy-relevant rare event would occur was quite low, and, as it happened, didn’t happen. So to speak.

20. In fact I think I may have picked up such an instance—the release may or may not have been accidental—at a recent workshop, though I’ll hold it back for now.

21. In a properly calibrated model, most of the predictions will be “obvious” to most experts: only the unexpected cases, and due to cognitive negativity bias, here largely the unexpected positive cases, will generate any interest. So one is left with a really, really small set of potential cases of interest.

22. In an internet cafe in some remote crossroads in Ghana, a group of disgruntled young men are saying “Damn, we’re busted! How’d he ever figure this out?”

Advertisements
Posted in Methodology, Programming | 1 Comment

Stuff I tell people about event data

Every few weeks—it’s a low-frequency event with a Poisson distribution, and thus exponentially distributed inter-arrival times—someone contacts me (typically from government, an NGO or a graduate student) who has discovered event data and wants to use it for some project. And I’ve gradually come to realize that there’s a now pretty standard set of pointers that I provide in terms of the “inside story” [1] unavailable in the published literature, which in political science tends to lag current practice by three to five years, and that’s essentially forever in the data science realm. While it would be rare for me to provide this entire list—seven items of course—all of these are potentially relevant if you are just getting into the field, so to save myself some typing in the future, here goes.

(Note, by the way, this is designed to be skimmed, not really read, and I expect to follow this list fairly soon with an updated entry—now available!—on seven priorities in event data research.)

1. Use ICEWS

Now that ICEWS  is available in near real time—updated daily, except when it isn’t— it’s really the only game in town and likely to remain so until the next generation of coding programs comes along (or, alas, its funding runs out).

ICEWS is not perfect:

  • the technology is about five years old now
  • the SERIF/ACCENT coding engine and verb/event dictionaries are proprietary (though they can be licensed for non-commercial use: I’ve been in touch with someone who has successfully done this)
  • the output is in a decidedly non-standard format, but see below
  • sources are not linked to traced back to specific URLs—arrgghhh, why not???
  • the coding scheme is CAMEO, never intended as a general ontology [2], and in a few places—largely to resolve ambiguities in the original—this is defined somewhat differently than the original University of Kansas CAMEO
  • the original DARPA ICEWS project was focused on Asia, and there is definitely still an Asia-centric bias to the news sources
  • due to legal constraints on the funding sources—no, not some dark conspiracy: this restriction dates to the post-Watergate 1970s!—it does not cover the US

But ICEWS has plenty of advantages as well:

  • it provides generally reliable daily updates
  • it has relatively consistent coverage across more than 20 years, though run frequency checks over time, as there as a couple quirks in there, particularly at the beginning of the series
  • it is archived in the universally-available and open-access Dataverse
  • it uses open (and occasionally updated) actor and sector/agent databases
  • there is reasonably decent (and openly accessible) documentation on how it works
  • it was written and refined by a professional programming team at BBN/Raytheon which had substantial resources over a number of years
  • it has excellent coverage across the major international news sources (though again, run some frequency checks: coverage is not completely consistent over time)
  • it has a tolerable false-positive rate

And more specifically, there is at least one large family of academic journals which now accepts event data research—presumably with the exception of studies comparing data sets—only if they are done using ICEWS: if you’ve done the analysis using anything else, you will be asked to re-do it with ICEWS. Save those scripts!

As for the non-standard data format: just use my text_to_CAMEO program to convert the output to something that looks like every other event data set.

The major downside to ICEWS is a lack of guaranteed long-term funding, which is problematic if you plan to rely on it for models intended to be used in the indefinite future. More generally, I don’t think there are plans for further development, beyond periodically updating the actor dictionaries: the BBN/Raytheon team which developed the coder left for greener pastures [3] and while Lockheed (the original ICEWS contractor) is updating the data, as far as I know they aren’t doing anything with the coder. For the present it seems that the ICEWS coder (and CAMEO ontology) are “good enough for government work” and it just is what it is. Which isn’t bad, just that it could be better with newer technology.

2. Don’t use one-a-day filtering

Yes, it seemed like a good idea at the time, around 1995, but it amplifies coding errors (which is to say, false positives): see the discussion in http://eventdata.parusanalytics.com/papers.dir/Schrodt.TAD-NYU.EventData.pdf (pp. 5-7). We need some sort of duplicate filtering, almost certainly based on clustering the original articles at the text level (which, alas, requires access to the texts, so it can’t be done as a post-coding step with the data alone), but the simple one-a-day approach is not it. Note that ICEWS does not use one-a-day filtering.

3. Don’t use the “Goldstein” scale

Which for starters, isn’t the Goldstein scale, which Joshua Goldstein developed in a very ad hoc manner back in the late 1980s [https://www.jstor.org/stable/174480: paywalled of course, this one at $40] for the World Events Interaction Survey (WEIS) ontology. The scale which is now called “Goldstein” is for the CAMEO ontology, and was an equally ad hoc effort initiated around 2002 by a University of Kansas graduate student named Uwe Reising for an M.A. thesis while CAMEO was still under development, primarily by Deborah Gerner and Ömür Yilmaz, and then brought into final form by me, maybe 2005 or so, after CAMEO had been finalized. But it rests entirely on ad hoc decisions: there’s nothing systematic about the development. [4]

The hypothetical argument that people make against using these scales—the WEIS- and CAMEO-based scales are pretty much comparable—is that positive (cooperative) and negative (conflictual) events in a dyad could cancel each other out, and one would see values near zero both in dyads where nothing was happening and in dyads where lots was happening. In fact, that perfectly balanced situation almost never occurs: instead  any violent—that is, material—conflict dominates the scaled time series, and completely lost is any cross-dyad or cross-time variation in verbal behavior—for example negotiations or threats—whether cooperative or conflictual.

The solution, which I think is common in most projects now, is to use “quad counts”: the counts of the events in the categories material-cooperation, verbal-cooperation, verbal-conflict and material-conflict.

4. The PETRARCH-2 coder is only a prototype

The PETRARCH-2 coder (PETR-2) was developed in the summer of 2015 by Clayton Norris, at the time an undergraduate (University of Chicago majoring in linguistics and computer science) intern at Caerus Analytics. [14] It took some of the framework of the PETRARCH-1 (PETR-1) coder, which John Beieler and I had written a year earlier—for example the use of a constituency parse generated by the Stanford CoreNLP system, and the input format and actor dictionaries are identical—but the event coding engine is completely new, and its verb-phrase dictionaries  are a radical simplification of the PETR-1 dictionaries, which were just the older TABARI dictionaries. The theoretical approach underlying the coder and the use of the constituency parse are far more sophisticated than those of the earlier program, and it contains prototypes for some pattern-based extensions such as verb transformations.  I did some additional work on the program a year later which made PETR-2 sufficiently robust as to be able to code a corpus of about twenty-million records without crashing. Even a record consisting of nothing but exam scores for a school somewhere in India.

So far, so good but…PETR-2 is only a prototype, a summer project, not a fully completed coding system! As I understand it, the original hope at Caerus had been to secure funding to get PETR-2 fully operational, on par with the SERIF/ACCENT coder used in ICEWS, but this never happened. So the project was left in limbo on at least the following dimensions

  • While a verb pattern transformation facility exists in PETR-2, it is only partially implemented for a single verb, ABANDON
  • If you get into the code, there are several dead-ends where Norris clearly had intended to do more work but ran out of time
  • There is no systematic test suite, just about seventy more or less random validation cases and a few Python unit-tests [5]
  • The new verb dictionaries and an internal transformation language called pico effectively defines yet-another dialect of CAMEO
  • The radically simplified verb dictionaries have not been subjected to any systematic validation and, for example, there was a bug in dictionaries—I’ve now corrected this on GitHub—which over-coded the CAMEO 03 category
  • The actor dictionaries are still essentially those of TABARI at the end of the ICEWS research phase, ca. 2011

This is not to criticize Norris’s original efforts—it was a summer project by an undergraduate for godsakes!—but the program has not had the long-term vetting that several other programs such as TABARI (and its Java descendent, JABARI [6]) and SERIF/ACCENT have had. [7]

Despite these issues, PETR-2 has been used to produce three major data sets—Cline Phoenix , TERRIER and UT/Dallas Phoenix. All of these could, at least in theory, be recoded at some point since all of these are based on legal copies of the relevant texts [8]

5. But all of these coders generate the same signal: The world according to CAMEO looks pretty much the same using any automated event coder and any global news source

Repeating a point I made in an earlier entry [https://asecondmouse.wordpress.com/2017/02/20/seven-conjectures-on-the-state-of-event-data/] which I simply repeat here with minimal updating as little has changed:

The graph below shows frequencies across the major (two-digit) categories of CAMEO using three different coders, PETRARCH 1 and 2 , and Raytheon/BBN’s ACCENT (from the ICEWS data available on Dataverse) for the year 2014. This also reflects two different news sources: the two PETRARCH cases are LexisNexis; ICEWS/ACCENT is Factiva, though of course there’s a lot of overlap between those.

Basically, “CAMEO-World” looks pretty much the same whichever coder and news source you use: the between-coder variances are completely swamped by the between-category variances. What large differences we do see are probably due to changes in definitions: for example PETR-2 over-coded “express intent to cooperate” (CAMEO 03) due to the aforementioned bug in the verb dictionaries; I’m guessing BBN/ACCENT did a bunch of focused development on IEDs and/or suicide bombings so has a very large spike in “Assault” (18) and they seem to have pretty much defined away the admittedly rather amorphous “Engage in material cooperation” (06).

I think this convergence is due to a combination of three factors:

  1. News source interest, particularly the tendency of news agencies (which all of the event data projects are now getting largely unfiltered) to always produce something, so if the only thing going on in some country on a given day is a sister-city cultural exchange, that will be reported  (hence the preponderance of events in the low categories). Also the age-old “when it bleeds, it leads” accounts for the spike on reports of violence (CAMEO categories 17, 18,19).
  2. In terms of the less frequent categories, the diversity of sources the event data community is using now—as opposed to the 1990s, when the only stories the KEDS and IDEA/PANDA projects coded were from Reuters, which is tightly edited—means that as you try to get more precise language models using parsing (ACCENT and PETR-2), you start missing stories that are written in non-standard English that would be caught by looser systems (PETR-1 and TABARI). Or at least this is true proportionally: on a case-by-case basis, ACCENT could well be getting a lot more stories than PETR-2 (alas, without access to the corpus they are coding, I don’t know) but for whatever reason, once you look at proportions, nothing really changes except where there is a really concentrated effort (e.g. category 18), or changes in definitions (ACCENT on category 06; PETR-2 unintentionally on category 03).
  3. I’m guessing (again, we’d need the ICEWS corpus to check, and that is unavailable due to the usual IP constraints) all of the systems have similar performance in not coding sports stories, wedding announcements, recipes, etc:  I know PETR-1 and PETR-2 have about a 95% agreement on whether a story contains an event, but a much lower agreement on exactly what the event is: again, their verb dictionaries are quite different. The various coding systems probably also have a fairly high agreement at least on the nation-state level of which actors are involved.

6. Quantity is not quality

Which is to say, event data coding is not a task where throwing gigabytes of digital offal at the problem is going to improve results, and we are almost certainly reaching a point where some of the inputs to the models have been deliberately and significantly manipulated. This also compounds the danger of focusing on where the data is most available, which tends to be areas where conflict has occurred in the past and state controls are weak. High levels of false positives are bad and contrary to commonly-held rosy scenarios, duplicate stories aren’t a reflection of importance but rather of convenience, urban, and other biases. But you need the texts to reliably eliminate duplicates.

The so-called web “inversion”—the point where more information on the web is fake than real, which we are either approaching or have already passed—probably marks the end of efforts to develop trigger models—the search for anticipatory needles-in-a-haystack in big data—in contemporary data. That said, a vast collection of texts from prior to the widespread manipulation of electronic news feeds exists (both in the large data aggregators—LexisNexis, Factiva, and ProQuest—and with the source texts held, under unavoidable IP restrictions, by ICEWS, Cline, the University of Oklahoma TERRIER project and presumably the EU JRC) and these are likely to be extremely valuable resources for developing filters which can distinguish real from fake news.

Due to the inversion, particularly when dealing with politically sensitive topics (or rather, topics that are considered sensitive by some group with reasonably good computer skills and an internet connection), social media are probably now a waste of time in terms of analyzing real-world events (they are still, obviously, useful in analyzing how events appear on social media), and likely will provide a systematically distorted signal.

7. There is an open source software singularity (but not the other singularity…)

Because I don’t live in Silicon Valley, some of the stuff coming out of there by the techno-utopians —Ray Kurzweil is the worst, with Peter Thiel (who has fled the Valley) and Elon Musk close seconds, and Thomas Friedman certainly an honorary East Coast participant—seems utterly delusional. Which, in fact, it is, but in my work as a programmer/data scientist I’ve begun to understand where at least some of this is coming from, and that is what I’ve come to call the “software singularity.” This being the fact that code—usually in multiple ever-improving variants—for doing almost anything you want is now available for free and has an effective support community on Stack Overflow: things that once took months now can be done in hours.

Some examples relevant to event data:

  • the newspaper3k library downloads, formats and updates news scrapping in 20 lines of Python
  • requests-HTML can handle downloads even when the content is generated by javascript code
  • universal dependency parses provide about 90% of the information required for event coding [9]
  • easily deployed data visualization dashboards are now too numerous to track [10]

And this is a tiny fraction of the relevant software: for example the vast analytical capabilities of the Python and R statistical and machine learning libraries would have, twenty years ago, cost tens if not hundreds of thousands of dollars (but the comparison is meaningless: the capabilities in these libraries simply didn’t exist at any price) and required hundreds of pounds—or if you prefer, linear-feet—of documentation.

To take newspaper3k as an illustrative example, the task of downloading news articles, even from a dedicated site such as Reuters, Factiva, or LexisNexis (and these are the relatively easy cases) requires hundreds of lines of code—and I spent countless hours over three decades writing and modifying such code variously in Pascal, Simula [11], C, Java, perl, and finally Python—to handle the web pipeline, filtering relevant articles, getting rid of formatting, and extracting relevant fields like the date, headline, and text. With newspaper3k , the task looks pretty much [READ THIS FOOTNOTE!!!] like this:

import newspaper

reut_filter = ["/photo/", "/video", "/health/", "/www.reuters.tv/",
"/jp.reuters.com/",...,  "/es.reuters.com/"] # exclude these

a_paper = newspaper.build("https://www.reuters.com/")
for article in a_paper.articles:
    if "/english/" not in article.url: # section rather than article
        continue
    for li in reut_filter:
        if li in article.url: break
    else
        article.download()
        article.parse()
        with open("reuters_" + article.url + ".txt") as fout:
            fout.write("URL: " + article.url + "\n")
            fout.write("Date: " + str(article.publish_date) + "\n")
            fout.write("Title: " + article.title + "\n")
            fout.write("Text:\n" + article.text + "\n")

An important corollary: The software singularity (and inexpensive web-based collaboration tools) enables development to be done very rapidly with small decentralized “remote” teams rather than the old model of large programming shops. In the software development community in Charlottesville, our CTO group [12] focuses on this as the single greatest current opportunity, and doing it correctly is the single greatest challenge, and I think Gen-Xers and Millennials in academia have also largely learned this: for research at least, the graduate “bull-pen” [13] is now global.

That other singularity?: no, sentient killer robots are not about to take over the world, and you’re going to die someday. Sorry.

A good note to end on.

Reference

Blog entries on event data in rough order of utility/popularity:

and the followup to this:

Footnotes

READ THIS FOOTNOTE!!!: I’ve pulled out the core code here from a working program which is about three times as long—for example it adjusts for the contingency that article.publish_date is sometimes missing—and this example code alone may or may not work. The full program is on GitHub: it definitely works and ran for days without crashing.

1. The working title for this entry was “S**t I tell people about event data.”

2. See the documentation for PLOVER —alas, still essentially another prototype—on problems with using CAMEO as a general coding framework.

3. Though I have heard this involved simply taking jobs with another company working out of the same anonymous Boston-area office park.

4. Around this same time, early 2000s, the VRA project undertook a very large web-based effort using a panel of experts to establish agreed-upon weights for their IDEA event coding ontology, but despite considerable effort they could never get these to converge. In the mid-1990s, I used a genetic algorithm to find optimal weights for a [admittedly somewhat quirky] clustering problem: again, no convergence, and wildly different sets of weights could produce more or less the same results.

5. TABARI, in contrast, has a validation suite—typically referred to as the “Lord of the Rings test suite” since most of the actor vocabulary is based on J.R.R. Tolkien’s masterwork, which didn’t stop a defense contractor from claiming “TABARI doesn’t work” after trying to code contemporary news articles based on a dictionary focused on hobbits, elves, orcs, and wizards—of about 250 records which systematically tests all features of the program as well as some difficult edge cases encountered in the past.

6. Lockheed’s JABARI, while initially just a Java version of TABARI—DARPA, then under the suzerainty of His Most Stable Genius Tony Tether, insisted that Lockheed’s original version duplicate not just the features of TABARI, but also a couple of bugs that were discovered in the conversion—was significantly extended by Lockheed’s ICEWS team, and was in fact an excellent coding program but was abandoned thanks to the usual duplicitous skullduggery that has plagued US defense procurement for decades: when elephants fight, mice get trampled. After witnessing a particularly egregious episode of this, I was in our research center at Kansas and darkly muttered to no one in particular “This is why you should make sure your kids learn Chinese.” To which a newly hired secretary perked up with “Of course my kids are learning Chinese!”

7. I will deal with the issue of UniversalPETRARCH—another partially-finished prototype—in the next entry. But in the meanwhile, note that the event coding engines of these three “PETRARCH” programs are completely distinct; the main thing they share in common is their actor dictionaries.

8. See in particular the Cline Center’s relatively recent “Global News Archive“:  70M unduplicated stories, 100M original, updated daily. The Cline Center has some new research in progress comparing several event data sets: a draft was presented at APSA-18 and a final version is near completion: you can contact them. Also there was a useful article comparing event data sets in Science about two years ago:  http://science.sciencemag.org/content/353/6307/1502

9. 90% in the sense that in my experiments so far, specifically with the proof-of-concept mudflat coder, code sufficient for most of the functionality required for event coding is about 10% the length of comparable code processing a constituency parse or a just doing an internal sparse parse. Since mudflat is just a prototype and edge cases consume lots of code, 90% reduction is probably overly generous, but still, UD parses are pretty close to providing all of the information you need for event coding.

10. Curiously, despite the proliferation of free visualization software, the US projects ICEWS, PITF and UT/D RIDIR never developed public-facing dashboards, compared to the extensive dashboards available at European-based sites such as ACLED, ViEWS, UCDP and EMM NewsBrief.

11. A short-lived simulation language developed at the University of Oslo in the 1960s that is considered the first object-oriented language and had a version which ran on early Macintosh computers that happened to have some good networking routines (the alternative at the time being BASIC). At least I think that’s why I was using it.

12. I’ve been designated an honorary CTO in this group because I’ve managed large projects in the past. And blog about software development. Most of the participants are genuine CTOs managing technology for companies doing millions of dollars of business per year, and were born long after the Beatles broke up.

13. I think this term is general: it refers to large rooms, typically in buildings decades past their intended lifetime dripping with rainwater, asbestos, and mold where graduate students are allocated a desk or table typically used, prior to its acquisition by the university sometime during the Truman administration, for plotting bombing raids against Japan. Resemblance to contemporary and considerably more expensive co-working spaces is anything but coincidental.

14. Norris was selected for this job by an exhaustive international search process consisting of someone in Texas who had once babysat for the lad asking the CEO of Caerus in the Greater Tyson’s Corner Metropolitan Area whether she by chance knew of any summer internship opportunities suitable for someone with his background. 

Posted in Methodology | 1 Comment

Instability Forecasting Models: Seven Ethical Considerations

So, welcome, y’all, to the latest bloggy edition on an issue probably relevant to, at best, a couple hundred people, though once again it has been pointed out to me that it is likely to be read by quite a few of them. And in particular, if you are some hapless functionary who has been directed to read this, a few pointers

  • “Seven” is just a meme in this blog
  • Yes, it is too long: revenge of the nerds. Or something. More generally, for the length you can blame some of your [so-called] colleagues to whom I promised I’d write it
  • You can probably skip most of the footnotes. Which aren’t, in fact, really footnotes so much as another meme in the blog. Some of them are funny. Or at least that was the original intention
  • You can skip Appendix 1, but might want to skim Appendix 2
  • ICEWS = DARPA Integrated Conflict Early Warning System; PITF = U.S. multi-agency Political Instability Task Force; ACLED = Armed Conflict Location and Event Data; PRIO = Peace Research Institute Oslo; UCDP = Uppsala [University] Conflict Data Program; DARPA = U.S. Defense Advanced Research Projects Agency; EU JRC = European Commission Joint Research Centre
  • Yes, I’m being deliberately vague in a number of places: Chatham House rules at most of the workshops and besides, if you are part of this community you can fill in the gaps and if you aren’t, well, maybe you shouldn’t have the information [1]

Violating the Bloggers Creed of absolute self-righteous certainty about absolutely everything, I admit that I’m writing this in part because some of the conclusions end up at quite a different place than I would have expected. And there’s some inconsistency: I’m still working this through.

Prerequisites out of the way, we shall proceed.

Our topic is instability forecasting models—IFMs—which are data-based quantitative models, originally statistical, now generally using machine learning methods, which forecast the probabilities of various forms of political instability such as war, civil war, mass protests, even coups, at present typically (though not exclusively) at the level of the nation-state and with a time horizon of about two years.  The international community developing these models has, in a sense, become the dog that caught the car: We’ve gone from “forecasting political instability is impossible: you are wasting your time” to “everyone has one of these models” in about, well, seven years.

As I’ve indicated in Appendix 1—mercifully removed from the main text so that you can skip it—various communities have been at this for a long time, certainly around half a century, but things have changed—a lot—in a relatively short period of time. So for purposes of discussion, let’s start by stipulating three things:

  1. Political forecasting per se is nothing new: any policy which requires a substantial lead time to implement (or, equivalently, which is designed to affect the state of a political system into the future, sometimes, as in the Marshall Plan or creation of NATO and later the EU, very far into the future) requires some form of forecasting: the technical term (okay, one technical term…) is “feedforward.”  The distinction is we now can do this using systematic, data-driven methods.[2]
  2. The difference between now and a decade ago is that these models work and they are being seriously implemented, with major investments, into policy making in both governments and IGOs. They are quite consistently about 80% accurate,[3] against the 50% to 60% accuracy of most human forecasters (aside from a very small number of “superforecasterswho achieve machine-level accuracy). This is for models using public data, but I’ve seen little evidence that private data substantially changes accuracy, at least at the current levels of aggregation (it is possible that it might at finer levels in both geographical and temporal resolution) [4]. The technology is now mature: in recent workshops I’ve attended, both the technical presentations and the policy presentations were more or less interchangeable. We know how to do these things, we’ve got the data, and there is an active process of integrating them into the policy flow: the buzzphrase is “early warning and early action” (EWEA), and  the World Bank estimates that even if most interventions fail to prevent conflict, the successes have such a huge payoff that the effort is well worthwhile even from an economic, to say nothing a humanitarian, perspective.
  3. In contrast to weather forecasting models—in many ways a good analogy for the development of IFMs—weather doesn’t respond to the forecast, whereas political actors might: We have finally hit a point where we need to worry about “reflexive” prediction. Of course, election forecasting has also achieved this status, and consequently is banned in the days or weeks before elections in many democracies.  Economic forecasting long ago also passed this point and there is even a widely accepted macroeconomic theory, rational expectations, dealing with it. But potential reflexive effects are quite recent for IFMs.

As of about ten years ago, the position I was taking on IFMs—which is to say, before we had figured out how to create these reliably, though I still take this position with respect to the data going to these—was that our ideal end-point would be something similar to the situation with weather and climate models [5]: an international epistemic community would develop a series of open models that could be used by various stakeholders—governments, IGOs and NGOs—to monitor evolving cases of instability across the planet, and in some instances these alerts would enable early responses to alleviate the conflict—EWEA—or failing that, provide, along the lines of the famine forecasting models, sufficient response to alleviate some of the consequences, notably refugee movements and various other potential conflict spill-over effects. As late as the mid-2000s, that was the model I was advocating.

Today?—I’m far less convinced we should follow this route, for a complex set of reasons both pragmatic and ethical which I still have not fully resolved and reconciled in my own mind, but—progress of sorts—I think I can at least articulate the key dimensions. 

1. Government and IGO models are necessarily going to remain secret, for reasons both bureaucratic and practical.

Start with the practical: in the multiple venues I’ve attended over the past couple of years, which is to say during the period when IFMs have gone from “impossible” to “we’re thinking about it” to “here’s our model”, everyone in official positions has been adamant that their operational models are not going to become public. The question is then whether those outside these organizations, particularly as these models are heavily dependent on NGO and academic data sets, should accept this or push back.

To the degree that this tendency is simply traditional bureaucratic siloing and information hoarding—and there are certainly elements of both going on—the natural instinct would be to push back. However, I’ve come to accept the argument that there could be some legitimate reasons to keep this information confidential due to the fact that the decisions of governments and IGOs, which can potentially wield resources on the order of billions of dollars, can have substantial reflexive consequences on decisions that could affect the instability itself, in particular

  • foreign direct investment and costs of insurance
  • knowledge that a conflict is or is not “on the radar” for possible early action
  • support for NGO preparations and commitments
  • prospects for collective action, discussed below

2. From an academic and NGO perspective, there is a very substantial moral issue in forecasting the outcome of a collective action event.

This is the single most difficult issue in this essay: are there topics, specifically those dealing with collective action, which should be off-limits, at least in the public domain, even for the relatively resource-poor academic and NGO research communities?

The basic issue here is that—at least with the current state of the technology—even if governments and IGOs keep their exact models confidential, the past ten years or so have shown that one can probably fairly easily reverse engineer these except for the private information: at least at this point in time, anyone trying to solve this problem is going to wind up with a model with relatively clear set of methods, data and outcomes, easily duplicated with openly available software and data.[6][7]

So in our ideal world—the hurricane forecasting world—the models are public, and when they converge, the proverbial red lights flash everywhere, and the myriad components of the international system gear up to deal with the impending crisis, and when it happens the early response is far more effective than waiting until the proverbial truck is already halfway over the cliff. And all done by NGOs and academic researchers, without the biases of governments.

Cool. But what if, instead, those predictions contribute to the crisis, and in the worst case scenario, cause a crisis that otherwise would not have occurred. For example through individuals reading predictions of impending regime transition, using that information to mobilize collective action, which then fails: we’re only at 80% to 85% accuracy as it is, and this is before taking into account possible feedback effects. [8] Hundreds killed, thousands imprisoned, tens of thousands displaced. Uh, bummer.

One can argue, of course, that this is no different that what is already happening with qualitative assessments: immediately coming to mind is the Western encouragement of the Hungarian revolt in 1956, the US-supported Bay of Pigs invasion, North Vietnam’s support of the Tet Offensive, which destroyed the indigenous South Vietnamese communist forces,[9] and US ambiguity with respect to the Shi’a uprisings following the 1991 Iraq War. And this is only a tiny fraction of such disasters.

But they were all, nonetheless, disasters with huge human costs, and actions which affect collective resistance bring to mind J.R.R. Tolkien’s admonition: “Do not meddle in the affairs of wizards, for they are subtle and quick to anger.” Is this the sort of thing the NGO and academic research community, however well meaning, should risk?

3. Transparency is nonetheless very important in order to assess limitations and biases of models.

Which is what makes the first two issues so problematic: despite the convergence in the existing models, every model has biases [10] and while the existing IFMs have converged, there is no guarantee that this will continue to be the case as new models are developed which are temporally and/or spatially more specific, or which take on new problems, for example detailed refugee flow models. Furthermore, since the contributions of the academic and NGO communities were vital to moving through the “IFM winter”—see Appendix 1—continuing to have open, non-governmental efforts seems very important.

Two other thoughts related to this

  1. Is it possible that the IFM ecosystem has become too small because the models are so easy to create? I’m not terribly worried about this because I’ve seen, in multiple projects, very substantial efforts to explore the possibility that other models exist, and they just don’t seem to be there, at least as for the sets of events currently of interest, but one should always be alert to the possibility of what appears to be a technological maturity is a failure of imagination.
  2. Current trends in commercial data science (as opposed to open source software and academic research) may not be all that useful for IFM development because this is not a “big data” problem: one of the curious things I noted at a recent workshop on IFMs is that deep learning was never mentioned. Though looking forward counterfactually, it is also possible that rare events—where one can envision even more commercial applications than those available in big data—are the next frontier in machine learning/artificial intelligence.

4. Quality is more important than quantity.

Which is to say, this is not a task where throwing gigabytes of digital offal at the problem is going to improve results, and we may be reaching a point where some of the inputs to the models have been deliberately and significantly manipulated because such manipulation is increasingly common. Also there is a danger in focusing on where the data is most available, which tends to be areas where conflict has occurred in the past and state controls are weak. High levels of false positives—notably in some atomic (that is, ICEWS-like) event data sets—are bad and contrary to commonly-held rosy scenarios, duplicate stories aren’t a reflection of importance but rather of convenience, urban and other biases.

The so-called web “inversion”—the point where more information on the web is fake than real, which we are either approaching or may have already passed—probably marks the end, alas,  of efforts to develop trigger models—the search for anticipatory needles-in-a-haystack in big data—in contemporary data, though it is worth noting that a vast collection of texts from prior to the widespread manipulation of electronic news feeds exists (both in the large news aggregators—LexisNexis, Factiva, and ProQuest—and with the source texts held, under unavoidable IP restrictions, by ICEWS, the University of Illinois Cline Center, the University of Oklahoma TERRIER project and presumably the EU JRC) and these are likely to be extremely valuable resources for developing filters which can distinguish real from fake news. They could also be useful in determining whether, in the past, trigger models are real, rather than a cognitive illusion borne of hindsight—having spent a lot of time searching for these with few results, I’m highly skeptical, but it is an empirical question—but any application of these in the contemporary environment will require far more caution than would have been needed, say, a decade ago.[11]

5. Sustainability of data sources.

It has struck me at a number of recent workshops—and, amen, in my own decidedly checkered experience in trying to sustain near-real-time atomic event data sets—the degree to which event data—structural data being generally solidly funded as national economic and demographic statistics—used in IFM models depends on a large number of small projects without reliable long-term funding sources. There are exceptions—UCDP as far as I understand has long-term commitments from the Swedish government, both PRIO and ACLED have gradually accumulated relatively long-term funding through concerted individual efforts, and to date PITF has provided sustained funding for several data sets, notably Polity IV and less notably the monthly updates of the  Global Atrocities Data Set—but far too much data is coming from projects with relatively short-term funding, typically from the US National Science Foundation, where social science grants tend to be just two or three years, with no guarantee of renewal, and grants from foundations which tend to favor shiny new objects over slogging through stuff that just needs to be done to support a diffuse community.

The ethical problem here is the extent to which one can expect researchers to invest in models using data which may not be available in the future, and, conversely, whether the absence of such guarantees is leading the collective research community to spend too much effort in the proverbial search for the keys where the light is best. Despite several efforts over the years, political event data, whether the “atomic” events similar to ICEWS or the “episodic” events similar to ACLED, the Global Terrorism Database, and UCDP, have never attained the privileged status the U.S. NSF has accorded to the continuously-maintained American National Election Survey  and General Social Survey, and the user community may just be too small (or politically inept) to justify this. I keep thinking/hoping/imagining that increased automation in ever less expensive hardware environments will bring the cost of some of these projects down to the point where they could be sustained, for example, by a university research center with some form of stable institutional support, but thus far I’ve clearly underestimated the requirements.

Though hey, it’s mostly an issue of money: Mr. and Ms. Gates, Ms. Powell-Jobs, Mr. Buffet and friends, Mr. Soros, y’all looking for projects?

6. Nothing is missing or in error at random: incorrect predictions and missing values carry information.

This is another point where one could debate whether this involves ethics or just professional best-practice—again, don’t confine your search for answers to readily available methods where you can just download some software—but these decisions can have consequences.

The fact that information relevant to IFMs is not missing at random has been appreciated for some time, and this may be one of the reasons why machine learning methods—where “missing” is just another value—have fairly consistently out-performed statistical models. This does, however, suggest that statistical imputation—now much easier thanks to both software and hardware advances—may not be a very good idea and is potentially an important source of model bias.

There also seems to be an increasing appreciation that incorrect predictions, particularly false positives (that is, a country or region has been predicted to be unstable but is not) may carry important information, specifically about the resilience of local circumstances and institutions. And more generally, those off-diagonal cases—both the false positives and false negatives—are hugely important in the modeling effort and should be given far more attention than I’m typically seeing. [12]

A final observation: at what point are we going to get situations where the model is wrong because of policy interventions? [8, again] Or have we already? — that’s the gist of the EWEA approach. I am guessing that in most cases these situations will be evident from open news sources, though there may be exceptions where this is due to “quiet diplomacy”—or as likely, quiet allocation of economic resources—and will quite deliberately escape notice.

7. Remember, there are people at the end of all of these.

At a recent workshop, one of the best talks—sorry, Chatham House rules—ended with an impassioned appeal on this point from an individual from a region which, regrettably, has tended to be treated as just another set of data points in far too many studies. To reiterate: IFMs are predicting the behaviors of people, not weather.

I think these tendencies have been further exacerbated by what I’ve called “statutory bias” [10, again]  in both model and data development: the bureaucratic institutions responsible for the development of many of the most sophisticated and well-publicized models are prohibited by law from examining their own countries (or in the case of the EU, set of countries). And the differences can be stark: I recently saw a dashboard with a map of mass killings based on data collected by a European project which, unlike PITF and ICEWS data, included the US: the huge number of cases both in the US and attributable to US-affiliated operations made it almost unrecognizable compared to displays I was familiar with.

This goes further: suppose the massive increase in drug overdose deaths in the US, now at a level exceeding 70,000 per year, and as amply documented,  the direct result of a deliberate campaign by one of America’s wealthiest families, whose philanthropic monuments blot major cities across the land, suppose this had occurred in Nigeria, Tajikistan or Indonesia, might we at the very least be considering that phenomenon a candidate for a new form of state weakness and/or the ability of powerful drug interests to dominate the judicial and legislative process? But we haven’t.

On the very positive side, I think we’re seeing more balance emerging: I am particularly heartened to see that ECOWAS has been developing a very sophisticated IFM, at least at the level of North American and European efforts, and with its integration with local sources, perhaps superior. With the increasing global availability of the relevant tools, expertise, and, through the cloud, hardware, this will only increase, and while the likes of Google and Facebook have convinced themselves only whites and Asians can write software, [13] individuals in Africa and Latin America know better.

 

Whew…so where does this leave us? Between some rugged rocks and some uncomfortable hard places, to be sure, or there would have been no reason to write all of this in the first place. Pragmatics aside—well-entrenched and well-funded bureaucracies are going to set their own rules, irrespective of what academics, NGOs and bloggers are advocating—the possibility of developing models (or suites of models) which set off ill-advised collective action concerns me. But so does the possibility of policy guided by opaque models developed with flawed data and techniques, to say nothing of policies guided by “experts” whose actually forecasting prowess is at the level of dart-throwing chimps. And there’s the unresolved question of whether there something special about the forecasts of a quantitative model as distinct from those of an op-ed in the Washington Post or a letter or anonymous editorial in The Economist, again with demonstrably lower accuracy and yet part of the forecasting ecosystem for a century or more. Let the discussion continue.

I’ll close with a final personal reflection that didn’t seem to fit anywhere else: having been involved in these efforts for forty or so years, it is very poignant for me to see the USA now almost completely out of this game, despite the field having largely been developed in the US. It will presumably remain outside until the end of the Trump administration, and then depending on attitudes in the post-Trump era, rebuilding could be quite laborious given the competition with industry for individuals with the required skill sets though, alternatively, we could see a John Kennedyesque civic republican response by a younger generation committed to rebuilding democratic government and institutions on this side of the Atlantic. In the meantime, as with high speed rail, cashless payments, and universal health care, the field is in good hands in Europe. And for IFMs and cashless payments, Africa.

Footnotes

1. I went to college in a karst area containing numerous limestone caves presenting widely varying levels of technical difficulty. The locations of easy ones where you really had to make an effort—or more commonly, drink—to get yourself into trouble were widely known. The locations of the more difficult were kept confidential among a small group with the skills to explore them safely. Might we be headed in a similar direction in developing forecasting models?—you decide.

Someone about a year ago at one of these IFM workshops—there have been a bunch, to the point where many of the core developers know each other’s drink preferences—raised the issue that we don’t want forecasts to provide information to the “bad guys.” But where to draw the line on this, given that some of the bad guys can presumably reverse engineer the models from the literature, given the technical sophistication we’ve seen by such groups, e.g. in IEDs and the manipulation of social media. Suddenly the five-year publication lags (and paywalls?) in academic journals becomes a good thing?

2.  I finally realized the reason why we haven’t had serious research into how to integrate quantitative and qualitative forecasts—this is persistently raised as a problem by government and IGO researchers—is the academics and small research shops like mine have a really difficult time finding real experts (as opposed, say, to students or Mech Turkers) who have a genuine interest and knowledge of a topic, as distinct from just going through the motions and providing uninformed speculation. In such circumstances the value added by the qualitative information will be marginal, and consequently we’re not doing realistic tests of expert elucidation methods. So by necessity this problem—which is, in fact, quite important—is probably going to have to be solved in the government and IGO shops.

3. I’m using this term informally, as the appropriate metric for “accuracy” on these predictions, which involve rare events, is complicated. Existing IFMs can consistently achieve an AUC of 0.80 to 0.85, rarely going above (or below) that level, which is not quite the same as the conventional meaning of “accuracy” but close enough. There are substantial and increasingly sophisticated discussions within the IFM community on the issue of metrics: we’re well aware of the relevant issues.

4. One curious feature of IFMs may be that private data will become important at short time horizons but not at longer horizons. This contrasts to the typical forecasting problem where errors increase more or less exponentially as the time horizon increases. In current IFMs, structural indicators (mostly economic, though also historical), which are readily available in public sources, dominate in the long term, whereas event-based conditions may be more important in the short term. E.g. “trigger models”—if these are real, an open question—are probably not relevant in forecasting a large-scale event like Eastern Europe in 1989 or the Arab Spring, but could be very important in forecasting at a time horizon of a few weeks in a specific region.

5. Science had a nice article [http://science.sciencemag.org/content/363/6425/342] recently on these models: Despite the key difference of IFMs being potentially reflexive and the fact that  that one of our unexplored domains is the short term forecast, some of the approaches used in those models—emphasized in the excerpt below—could clearly be adapted to IFMs

Weather forecasts from leading numerical weather prediction centers such as the European Centre for Medium-Range Weather Forecasts (ECMWF) and National Oceanic and Atmospheric Administration’s (NOAA’s) National Centers for Environmental Prediction (NCEP) have also been improving rapidly: A modern 5-day forecast is as accurate as a 1-day forecast was in 1980, and useful forecasts now reach 9 to 10 days into the future (1). Predictions have improved for a wide range of hazardous weather conditions  [emphasis added], including hurricanes, blizzards, flash floods, hail, and tornadoes, with skill emerging in predictions of seasonal conditions.

Because data are unavoidably spatially incomplete and uncertain, the state of the atmosphere at any time cannot be known exactly, producing forecast uncertainties that grow into the future. This sensitivity to initial conditions can never be overcome completely. But, by running a model over time and continually adjusting it to maintain consistency with incoming data [emphasis added], the resulting physically consistent predictions greatly improve on simpler techniques. Such data assimilation, often done using four-dimensional variational minimization, ensemble Kalman filters, or hybridized techniques, has revolutionized forecasting.

Sensitivity to initial conditions limits long-term forecast skill: Details of weather cannot be predicted accurately, even in principle, much beyond 2 weeks. But weather forecasts are not yet strongly constrained by this limit, and the increase in forecast skill has shown no sign of ending. Sensitivity to initial conditions varies greatly in space and time [emphasis added], and an important but largely unsung advance in weather prediction is the growing ability to quantify the forecast uncertainty  [emphasis added] by using large ensembles of numerical forecasts that each start from slightly different but equally plausible initial states, together with perturbations in model physics.

6. I’m constantly confronted, of course, with the possibility that there are secret models feeding into the policy process that are totally different than those I’m seeing. But I’m skeptical, particularly since in some situations, I’m the only person in the room who has been witness to the process by which independent models have been developed, such being the reward, if that’s the word, for countless hours of my life frittered away in windowless conference rooms watching PowerPoint™ presentations. All I see is convergence, not just in the end result, but also in the development process.

Consequently if a trove of radically different—as distinct from incrementally different, however much their creators think they are novel—secret models exists, there is a vast and fantastically expensive conspiracy spanning multiple countries creating an elaborate illusion solely for my benefit, and frankly, I just don’t think I’m that important. I’m sure there are modeling efforts beyond what I’m seeing, but from the glimmers I see of them, they tend to be reinventing wheels and/or using methods that were tried and rejected years or even decades ago, and the expansiveness (and convergence) of known work makes it quite unlikely—granted, not impossible—that there is some fabulously useful set of private data and methodology out there. To the contrary, in general I see the reflections from the classified side as utterly hampered by inexperience, delusional expectations, and doofus managers and consultants who wouldn’t make it through the first semester of a graduate social science methodology course and who thus conclude that because something is impossible for them, it is impossible for anyone. Horse cavalry in the 20th century redux: generally not a path with a positive ending.

7. Providing, of course, one wants to: there may be specialized applications where no one has bothered to create public models even though this is technically possible.

8. One of the more frustrating things I have heard, for decades, is a smug observation that if IFMs become successful, the accuracy of our models will decline and consequently we modelers will be very sad. To which I say: bullshit! Almost everyone involved in IFM development is acutely aware of the humanitarian implications of the work, and many have extended field experience in areas experiencing stress due to political instability (which is not, in general, true of the folks making the criticisms, pallid Elois whose lives are spent in seminar rooms, not in the field). To a person, model developers would be ecstatic were the accuracy of their models to drop off because of successful interventions, and this is vastly more important to them than the possibility of Reviewer #2 recommending against publication in a paywalled journal (which, consequently, no one in a policy position will ever read) because the AUC hasn’t improved over past efforts.

9. Back in the days when people still talked of these things—the end of the Vietnam War now being almost as distant from today’s students than the end of World War I was from my generation—one would encounter a persistent urban legend in DoD operations research—ah, OR…now there’s a golden oldie…—circles that somewhere deep in the Pentagon was a secret computer model—by the vague details, presumably one of Jay Forrester’s systems dynamics efforts, just a set of difference equations, as the model was frequently attributed to MIT—that precisely predicted every aspect of the Vietnam War and had decision-makers only paid attention to this, we would have won. You know, like “won” in that we’d now be buying shrimp, t-shirts and cheap toys made in Vietnam and it would be a major tourist destination. I digress.

Anyway, I’m pretty sure that in reality dozens of such models were created during the Vietnam War period, and some of them were right some of the time, but, unlike the Elder Wand of the Harry Potter universe, no such omniscient Elder Model existed. This land of legends situation, I would also note, is completely different than where we are with contemporary IFMs: the models, data, methods, and empirical assessments are reasonably open, and there is a high degree of convergence in both the approaches and their effectiveness.

10. I’d identify five major sources of bias in existing event data: some of these affect structural data sets as well, but it is generally use to be aware of these.

  1. Statutory bias, also discussed under point 7: Due to its funding sources, ICEWS and PITF are prohibited by a post-Vietnam-era law from tracking the behavior of US citizens. Similarly, my understanding is that the EU IFM efforts are limited (either by law or bureaucratic caution) in covering disputes between EU members and internal instability within them. Anecdotally, some NGOs also have been known to back off some monitoring efforts in some regions in deference to funders.
  2. Policy bias: Far and away the most common application of event data in the US policy community has been crisis forecasting, so most of the effort has done into collecting data on violent (or potentially violent) political conflict. The EU’s JRC efforts are more general, and for example have foci on areas where the EU may need to provide disaster relief, but is still strongly focused on areas of concern to the EU.
  3. Urban bias: This is inherent in the source materials: for example during the Boko Haram violence in Nigeria, a market bombing in the capital Abuja generated about 400 stories; one in the regional capital of Maiduguri would typically generate ten or twenty, and one in the marginal areas near Lake Chad would generate one or two. Similarly, terrorist incidents in Western capitals such as Paris or London generate days of attention where events with far higher casualty rates in the Middle East or Africa typically are covered for just a day.
  4. Media fatigue: This is the tendency of news organizations to lose interest in on-going conflicts, covering them in detail when they are new but shifting attention even though the level of conflict continues.
  5. English-language bias: Most of the event data work to date—the EU JRC’s multi-language work being a major exception—has been done in English (and occasionally Spanish and Portuguese) and extending beyond this is one of the major opportunities provided by contemporary computationally-intensive methods, including machine translation, inter-language vector transformations, and the use of parallel corpora for rapid dictionary development; IARPA has a new project called BETTER focused on rapid (and low effort) cross-language information extraction which might also help alleviate this.

11. See for example https://publications.parliament.uk/pa/cm201719/cmselect/cmcumeds/1791/1791.pdf

12. Though this is changing, e.g. see Michael Colaresi https://twitter.com/colaresi/status/842291411298996224 on bi-separation plots, which, alas, links to yet-another-frigging paywalled article, but at least the sentiment is there.

13. See https://www.nytimes.com/2019/02/13/magazine/women-coding-computer-programming.html. Google and Facebook have 1% blacks and 3% Hispanics in their technical employees! Microsoft, to its credit, seems to be more enlightened.

Appendix 1: An extraordinarily brief history of how we got here

This will be mostly the ramblings of an old man dredging up fading memories, but it’s somewhat important,  in these heady days of the apparently sudden success of IFMs, to realize the efforts go way back.  In fact there’s a nice MA thesis to be done here, I suppose in some program in the history of science, on tracking back how the concept of IFMs came about. [A1]

Arguably the concept is firmly established by the time of Leibnitz [], who famously postulated a “mathematical philosophy” wherein

“[…] if controversies were to arise, there would be no more need of disputation between two philosophers than between two calculators. For it would suffice for them to take their pencils in their hands and to sit down at the abacus, and say to each other (and if they so wish also to a friend called to help): Let us calculate.”

I’m too lazy to thoroughly track things during the subsequent three centuries, but Newtonian determinism expressed through equations was in quite the vogue during much of the period—Laplace, famously—and by the 19th century data-based probabilistic inference would gradually develop, along with an ever increasing amount of demographic and economic data, and by the 1920s, we had a well-established, if logically inconsistent, science of frequentist statistical inference. The joint challenges of the Depression and planning requirements of World War II (and Keynesian economic management more generally) led to the incorporation of increasingly sophisticated economic models into policy making in the 1930s and 1940s, while on the political side, reliable public opinion polling was established after some famous missteps, and by the 1950s used for televised real-time election forecasting.

By the time I was in graduate school, Isaac Asimov’s Foundation Trilogy—an extended fictional work whose plot turns on the failures of a forecasting model—was quite in vogue, and on a more practical level, the political forecasting work of the founder of numerical meteorology, Lewis Fry Richardson—originally done in the 1930s and 1940s then popularized in the early 1970s by Anatol Rapoport and others, and by the establishment of the Journal of Conflict Resolution—who in 1939 self-published a monograph titled Generalized Foreign Politics where he convinced himself [A2] that the unstable conditions in his arms race models, expressed as differential equations, for the periods 1909-1913 and 1933-1938 successfully predicted the two world wars. Also at this point we saw various “systems dynamics” models, most [in]famously the Club of Rome’s fabulously inaccurate Limits to Growth model  published in 1972, which spawned about ten years of [also very poorly calibrated] similar efforts.

More critically, by the time I was in graduate school, DARPA was funding work on IFMs at a level that kept me employed as a computer programmer rather than teaching discussion sections for introductory international relations classes. These efforts would carry on well into the Reagan administration—at no less a level than the National Security Council, under Richard Beale’s leadership of a major event data effort—before finally being abandoned as impractical, particularly on the near-real-time data side,

In terms of the immediate precedents to contemporary IFMs, in the 1990s there were a series of efforts coming primarily coming out of IGOs and NGOs—specifically Kumar Rupesinghe at the NGO International Alert and the late Juergen Dedring within the United Nations (specifically its Office for Research and the Collection of Information)—as well as the late Ted Robert Gurr in the academic world, Vice President Al Gore and various people associated with the US Institute for Peace in the US government, and others far too numerous to mention (again, there’s a modestly interesting M.A. thesis here, and there is a very ample paper trail to support it) but again these went nowhere beyond spawning the U.S. State Failures Project, the direct predecessor of PITF, but the SFP’s excessively elaborate (expensive, and, ultimately, irreproducible) IFMs initially failed miserably due to a variety of technical flaws.

We then went into a “IFM Winter”—riffing on the “AI Winterof the late-1980s—in the 2000s where a large number of small projects with generally limited funding continued to work in a professional environment which calls to mind Douglas Adams’s classical opening to Hitchhiker’s Guide to the Galaxy

Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun. Orbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.

Yeah, that’s about right: during the 2000s IFM work was definitely amazingly primitive and far out in the academically unfashionable end of some uncharted backwaters. But this decade was, in fact, a period of gestation and experimentation, so that by 2010 we had seen, for example, the emergence of the ACLED project under Clionadh Raleigh, years of productive experimentation at PITF under the direction of Jay Ulfelder, the massive investment by DARPA in ICEWS [A3], substantial modeling and data collections effort at PRIO under the directorship of Nils-Petter Gleditsch and substantial expansion of the UCDP datasets. While models in the 1960s and 1970s were confined to a couple dozen variables—including some truly odd ducks, like levels of US hotel chain ownership in countries as a measure of US influence—PITF by 2010 had assembled a core data set containing more than 2500 variables. Even if it really only needed about a dozen of these to get a suite of models with reasonable performance.

All of which meant that the IFM efforts which had generally not been able to produce credible results in the 1990s became—at least for any group with a reasonable level of expertise—almost trivial to produce by the 2010s.[A5] Bringing us into the present.

Appendix footnotes

A1. A colleague recently reported that a journal editor, eviscerating an historical review article no less, required him (presumably because of issues of space, as we all are aware that with electronic publication, space is absolutely at a premium!) to remove all references to articles published prior to 2000. Because we are all aware that everything of importance—even sex, drugs, and rock-and-roll!—was introduced in the 21st century.

A2. I’m one of, I’m guessing, probably a couple dozen people who have actually gone through Richardson’s actual papers at Lancaster University (though these were eventually published, and I’d also defer to Oliver Ashford’s 1985 biography as the definitive treatment) and Richardson’s parameter estimates which lead to the result of instability are, by contemporary standards, a bit dubious and using more straightforward methods actually leads to a conclusion of stability rather than instability. But the thought was correct…

A3. Choucri and Robinson’s Forecasting in International Relations (1974) is a good review of these efforts in political science, which go back into the mid-1960s. As that volume has probably long been culled from most university libraries, Google brings up this APSR review by an obscure assistant professor at Northwestern but, demonstrating as ever the commitment of professional scientific organizations and elite university presses to the Baconian norm of universal access to scientific knowledge, reading it will cost you $25. You can also get a lot from an unpaywalled essay by Choucri still available at MIT. 

A4. The ICEWS program involved roughly the annual expenditures of the entire US NSF program in political science. Even if most of this went to either indirect costs or creating PowerPoint™ slides, with yellow type on a green background being among the favored motifs.

A5. As I have repeated on earlier occasions—and no, this is not an urban legend—at the ICEWS kick-off meeting, where the test data and the unbelievably difficult forecasting metrics, approved personally by no less than His Stable Genius Tony Tether, were first released, the social scientists went back to their hotel rooms and on their laptops had estimated models which beat the metrics before the staff of the defense contractors had finished their second round of drinks at happy hour. Much consternation followed, and the restrictions on allowable models and methods became ever more draconian as the program evolved. The IFM efforts of ICEWS—the original purpose of the program—never gained traction despite the success of nearly identical contemporaneous efforts at PITF—though ICEWS lives on, at least for now, as a platform for the production of very credible near-real-time atomic event data.

Appendix 2: Irreducible sources of error

This is included here for two reasons. First, the exposition of a systematic set of reasons as to why IFMs have an accuracy “speed limit”—apparently an out-of-sample AUC in the range of 0.80 to 0.85 at the two-year time horizon for nation-states—and if you try to get past this, in all likelihood you are just over-fitting the model. Second, it takes far too long to go through all of these reasons in a workshop presentation, but they are important.

  • Specification error: no model of a complex, open system can contain all of the relevant variables: “McChrystal’s hairball” is the now-classic exposition of this. 
  • Measurement error: with very few exceptions, variables will contain some measurement error. And this presupposing there is even agreement on what the “correct” measurement is in an ideal setting. 
  • Predictive accuracy is limited by measurement error: for example in the very simplified case of a bivariate regression model, if your measurement reliability is 80%, your accuracy can’t be more than 90%.  This biases parameter estimates as well as the predictions. 
  • Quasi-random structural error: Complex and chaotic deterministic systems behave as if they were random under at least some parameter combinations. Chaotic behavior can occur in equations as simple as x_{t+1} = ax_t^2 + bx_t 
  • Rational randomness such as that predicted by mixed strategies in zero-sum games. 
  • Arational randomness attributable to free-will: the rule-of-thumb from our rat-running colleagues: “A genetically standardized experimental animal, subjected to carefully controlled stimuli in a laboratory setting, will do whatever it damn pleases.” 
  • Effective policy response: as discussed at several point in the main text, in at least some instances organizations will have taken steps to head off a crisis that would have otherwise occurred, and as IFMs are increasingly incorporated into policy making, this is more likely to occur. It is also the entire point of the exercise. 
  • The effects of unpredictable natural phenomenon: for example, the 2004 Indian Ocean tsunami dramatically reduced violence in the long-running conflict in Aceh, and on numerous occasions in history important leaders have unexpectedly died (or, as influentially, not died and their effectiveness was gradually diminished).

Tetlock (2013) independently has an almost identical list of the irreducible sources of forecasting error.

Please note that while the 0.80 to 0.85 AUC speed limit has occurred relentlessly in existing IFMs, there is no theoretical reason for this number, and with finer geographical granularity and/or shorter time horizons, this could be smaller, larger, or less consistent across behaviors. For a nice discussion of the predictive speed limit issue in a different context, criminal recidivism, see Science 359:6373 19 Jan 2018, pg. 263; the original research is reported in Science Advances 10.1126/sciadv.aao5580 (2018)

 

Posted in Methodology, Politics | 3 Comments

Yeah, I blog…

A while back I realized I’d hit fifty blog posts, and particularly as recent entries have averaged—with some variance—about 4000 words, that’s heading towards 200,000 words, or two short paperbacks, or about the length of one of the later volumes of the Harry Potter opus, or 60%-70% of a volume of Song of Ice and Fire. So despite my general admonishment to publishers that I am where book projects go to die, maybe at this point I have something to say on the topic of blog writing.

That and I recently received an email—I’m suspicious that it comes from a bot, though I’m having trouble figuring out what the objectives of the bot might be (homework exercise?)—asking for advice on blogging. Oh, and this blog has received a total of 88,000 views, unquestionably vastly exceeding anything I’ve published in a paywalled journal. [1] And finally I’ve recently been reading/listening, for reasons that will almost certainly never see the light of day [2] on the process of writing: Bradbury (magical) [3], Forster (not aging well unless you are thoroughly versed in the popular literature of a century ago), James Hynes’s Great Courses series on writing fiction, as well as various “rules for writing” lists by successful authors.

So, in my own style, seven observations.

1. Write, write, write

Yes, write, write, write: that’s the one of two consistent bits of advice every writer gives. [4] The best consistently write anywhere from 500 to 1500 words a day, which I’ve never managed (I’ve tried: it just doesn’t work for me) but you just have to keep writing. And if something doesn’t really flow, keep writing until it does (or drop it and try something else). And expect to throw away your first million words. [5]

But keep your day job: I’ve never made a dime off this, nor expect to: I suppose I’ve missed opportunities to earn some beer money by making some deal with Amazon for the occasional links to books, but doesn’t seem worth the trouble/conflicts of interest, and you’ve probably also noticed the blog isn’t littered with advertisements for tactical flashlights and amazing herbal weight-loss potions. [6] Far from making money, for all I know my public display of bad attitude has lost me some funding opportunities. Those which would have driven me (and some poor program manager) crazy.

2. Edit, edit, edit

Yes, in a blog you are freed from the tyranny of Reviewer #2, but with great power comes great responsibility, so edit ruthlessly. This has been easy for me, as Deborah Gerner and I did just that on the papers we wrote jointly for some twenty years, and at least some people noticed. [7] And as the saying goes, variously attributed to Justice Louis Brandeis and writer Robert Graves, “There’s no great writing, only great rewriting.”

In most cases these blog entries are assembled over a period of days from disjointed chunks—in only the rarest of cases will I start from the proverbial blank page/screen and write something from beginning to end—which gradually come together into what I eventually convince myself is a coherent whole, and then it’s edit, edit, edit. And meanwhile I’ll be writing down new candidate sentences, phrases, and snark on note cards as these occur to me in the shower or making coffee or walking or weeding: some of them work, some don’t. For some reason WordPress intimidates me—probably the automatic formating, I note as I’m doing final editing here—so now I start with a Google Doc—thus insuring an interesting selection of advertisements subsequently presented to me by the Google omniverse—and only transfer to WordPress in the last few steps. Typically I spend about 8 to 10 hours on an entry, and having carefully proofread it multiple times before hitting “Publish,” invariably find a half-dozen or so additional typos afterwards. I’ll usually continue to edit and add material for a couple days after “publication,” while the work is still in my head, then move on.

3. Be patient and experiment

And particularly at first: It took some time for me to find the voice where I was most comfortable, which is the 3000 – 5000 word long form—this one finally settled in at about 3100 words, the previous was 4100 words—rather than the 600-900 words typical of an essay or op-ed, to say nothing of the 140/280 characters of a Tweet. [8] My signature “Seven…” format works more often than not, though not always and I realized after a while it could be a straitjacket. [9]  Then there is the early commenter—I get very occasional comments, since by now people have figured out I’m not going to approve most and I’m not particularly interested in most feedback, a few people excepted [4]—who didn’t like how I handled footnotes, but I ignored this and it is now probably the most definitive aspect of my style.

4. Find a niche

I didn’t have a clear idea of where the blog would go when I started it six years ago beyond the subtitle “Reflections on social science, politics and education.” It’s ended up in that general vicinity, though “Reflections on political methodology, conflict forecasting and politics” is probably more accurate now. I’ve pulled back on the politics over the last year or so since the blogosphere is utterly awash in political rants these days, and the opportunities to provide anything original are limited: For example I recently started and then abandoned an entry on “The New Old Left” which reflected on segments of the Democratic Party returning to classical economic materialist agendas following a generation or more focused on identity but, like, well, duh… [10]  More generally, I’ve got probably half as much in draft that hasn’t gone in as that which has, and some topics start out promising and never complete themselves: you really have to listen to your subject. With a couple exceptions, it’s the technical material that generates the most interest, probably because no one else is saying it.

5. It usually involves a fair amount of effort. But occasionally it doesn’t.

The one entry that essentially wrote itself was the remembrance of Heather Heyer, who was murdered in the white-supremacist violence in Charlottesville on 12 August 2017. The commentary following Will Moore’s suicide was a close second, and in both of these cases I felt I was writing things that needed to be said for a community. “Feral…”, which after five years invariably still gets couple views a day [11], in contrast gestated over the better part of two years, and its followup, originally intended to be written after one year, waited for three.

Successful writers of fiction often speak of times where their characters—which is to say, their subconscious—take hold of a plot and drive it in unexpected but delightful ways. For the non-fiction writer, I think the equivalent is when you capture a short-term zeitgeist and suddenly find relevant material everywhere you look [18], as well as waking up and dashing off to your desk to sketch out some phrases before you forget them. [12]

6. Yeah, I’m repetitive and I’m technical

Repetitive: see Krugman, P., Friedman, T., Collins, G., Pournelle, J., and Hanh, T. N. Or, OMG, the Sutta Pikata. And yes, there is a not-so-secret 64-character catch-phrase that is in pretty much every single entry irrespective of topic.[13] As in music, I like to play with motifs, and when things are working well, it’s nice to resolve back to the opening chord.

Using the blog as technical outlet, notably on issues dealing with event data, has been quite useful, even if that wasn’t in the original plan. Event data, of course, is a comparatively tiny niche—at most a couple hundred people around the world watch it closely—but as I’ve recently been telling myself (and anyone else who will listen), the puzzle with event data is it never takes off but it also never goes away. And the speed with which the technology has changed over the past ten years in particular is monumentally unsuited to the standard outlets of paywalled journals with their dumbing-down during the review process and massive publication delays. [14] Two entries, “Seven observations on the [then] newly released ICEWS data” and “The legal status of event data” have essentially become canonical: I’ve seen them cited in formal research papers, and they fairly reliably get at least one or two views a week, and more as one approaches the APSA and ISA conferences or NSF proposal deadlines. [15]

7. The journey must be the reward

Again, I’ve never made a dime off this directly [16], nor do I ever expect to unless somehow enough things accumulate that they could be assembled into a book, and people buy it. [17] But it is an outlet that I enjoy and I also have become aware, from various comments over the years, that this has made my views known to people, particularly on the technical side in government, I wouldn’t ever have direct access to: They will mention they read my blog, and a couple times I believe they’ve deliberately done so in the earshot of people who probably wish they didn’t. But fundamentally, like [some] sharks have to keep moving to stay alive, and salmon are driven to return upstream, I gotta write—both of my parents were journalists, so maybe as with the salmon it’s genetic?—and you, dear reader, get the opportunity to read some of it.

Footnotes

1. But speaking of paywalled journals, the major European research funders are stomping down big-time!  No embargo period, no “hybrid models”, publish research funded by these folks in paywalled venues and you have to return your grant money. Though if health care is any model, this trend will make it across the Atlantic in a mere fifty to seventy-five years.

2. A heartfelt 897-page Updike-inspired novel centered on the angst of an aging computer programmer in a mid-Atlantic university town obsessed with declining funding opportunities and the unjust vicissitudes of old age, sickness, and death.

Uh, no.

African-Americans, long free in the mid-Atlantic colonies due to a successful slave revolt in 1711-1715 coordinated with native Americans—hey, how come every fictional re-working of U.S. history has to have the Confederacy winning the Civil War?—working as paid laborers on the ever-financially-struggling Monticello properties with its hapless politician-owner, now attacked by British forces seeking to reimpose Caribbean slavery (as well as being upset over the unpleasantness in Boston and Philadelphia). Plus some possible bits involving dragons, alternative dimensions most people experience only as dark energy, and of course Nordic—friendly and intelligent—trolls.

Or—totally different story—a Catalonian Jesuit herbalist—yeah, yeah, I’m ripping off Edith Pargeter (who started the relevant series at age 64!), but if there is the village mystery genre (Christie, Sayers (sort of…), Robinson) and the noir genre (Hammett, Chandler, Elroy), there’s the herbalist monk genre—working in the Santa Marie della Scala in the proud if politically defeated and marginalized Siena in the winter of 1575 who encounters a young and impulsive English earl of a literary bent who may or may not be seeking to negotiate the return of England to Catholicism, thus totally, like totally!!! changing the entire course of European history (oops, no, that’s Dan Brown’s schtick…besides, those sorts of machinations were going on constantly during that era. No dragons or trolls in this one.) but then a shot rings out on the Piazza del Campo, some strolling friars pull off their cloaks to reveal themselves as Swiss Guards, and a cardinal lies mortally wounded?

Nah…I’m the place where book projects go to die…

3. Ah, Ray Bradbury: Growing up in fly-over country before it was flown over, writing 1,000 words a day since the age of twelve, imitating various pulp genres until his own literary voice came in his early 20s. A friend persuades him to travel across the country by train to visit NYC where after numerous meetings with disinterested publishers, an editor notes that his Martian and circus short stories were, in fact, the grist for two publishable books—which I of course later devoured as a teenager—and he returns home to his wife and child in LA with checks covering a year’s food and rent. Then Bradbury, then only a high-school education, receives a note that Christopher Isherwood would like to talk with him, and then Isherwood says they really should talk to his friend Aldous Huxley. And by 1953, John Huston asks him to write a screenplay for Moby Dick, provided he do this while living in the gloom of Ireland.

4. And—beyond edit, edit, edit—about the only one. For example, Bradbury felt that a massive diet of movies in his youth fueled his imagination; Stephen King says if you can’t give up television, you’re not serious about writing. About half of successful writers apparently never show unfinished drafts to anyone, the other half absolutely depend on feedback from a few trusted readers, typically agents and/or partners.

Come to think of it, two other near-universal bits of advice: don’t listen to critics, and, closely related, don’t take writers’ workshops very seriously (even if you are being paid to teach in them).

5. Which I’d read first from Jerry Pournelle, but it seems to be general folklore: Karen Woodward has a nice gloss on this.

6. Or ads for amazing herbal potions for certain male body functions. I actually drafted a [serious] entry for “The Feral Diet” I’d followed with some success for a while but, alas, like all diet regimes, it only worked for weight loss for a while (weight maintenance has been fine): I ignore my details and just follow Michael Pollan and Gary Taubes

7. High point was when we were asked by an NSF program director if it would be okay to share one of our [needless to say, funded] proposals with people who wanted an example of what a good proposal looked like.

8. Twitter is weird, eh? I avoided Twitter for quite some time, then hopped—hey, bird motifs, right?—in for about a year and a half, then hopped out again, using it now only a couple times a week. What is interesting is the number of people who are quite effectively producing short-form essays using 10 to 20 linked tweets, which probably not coincidentally translates to the standard op-ed length of around 700 – 800 words, but the mechanism is awkward, and certainly wouldn’t work for a long-form presentation. If Twitter bites the dust due to an unsustainable financial model—please, please, please, if only for the elimination of one user’s tweets in particular—that might open a niche for that essay form, though said niche might already be WordPress.

While we’re on the topic of alternative media, I’ve got the technology to be doing YouTube—works for Jordan Peterson and, by inference, presumably appeals to lobsters—but I suspect that won’t last both because of the technological limitations—WordPress may not be stable but the underlying text—it’s UTF-8 HTML!—is stable—and the fact the video form itself is more conversational and hence more transient. Plus I rarely watch YouTube: I can read a lot faster than most people speak.

9. Same with restricting the length, which I tried for a while, and usually putting constraints around a form improves it. But editing for length is a lot of work, as any op-ed columnist will tell you, and this is an informal endeavor. The “beyond the snark” reference section I employed for a while also didn’t last—in-line links work fine, and the ability to use hyperlinks in a blog is wonderful, one of the defining characteristics of the medium.

10. I’ve got a “Beyond Democracy” file of 25,000 words and probably a couple hundred links reflecting on the emergence of a post-democratic plutocracy and how we might cope with it: several unfinished essays have been stashed in this file. Possibly that could someday jell as a book, but, alas, have I mentioned that I am the place where book projects go to die? Are you tired of this motif yet?

11. The other entry which is consistently on the “Viewed” list on the WordPress dashboard—mind you, I only look at this for the two or three days after I post something to get a sense of whether it is getting circulated—is “History’s seven dumbest self-inflicted political disasters.” Whose popularity—this is Schrodt doing his mad and disruptable William McNeill imitation (badly…)—I absolutely cannot figure out: someone is linking it somewhere? Or some bot is just messing with me?

12. Dreaming of a topic for [seemingly] half the night: I hate that. The only thing worse—of course, beyond the standard dreams of being chased through a dank urban or forested landscape by a menacing evil while your legs turn to molasses and you simply can’t run fast enough—is dreaming about programming problems. If your dreams have you obsessing with some bit of writing, get out of bed and write it down: it will usually go away, and usually in the morning your nocturnal insight won’t seem very useful. Except when it is. Same with code.

13. Not this one: that would make it too easy.

14. I recently reviewed a paper—okay, that was my next-to-last review, honest, and a revise-and-resubmit, and really, I’m getting out of the reviewing business, and Reviewer #2 is not me (!!)—which attempted to survey the state of the art in automated event coding, and I’d say got probably two-thirds of the major features wrong. But the unfortunate author had actually done a perfectly competent review of the published literature, the problem being that what’s been published on this topic is the tip of the proverbial iceberg in a rapidly changing field and has a massive lag time. This has long been a problem, but is clearly getting worse.

15. Two others are also fairly useful, if both a bit dated: “Seven conjectures on the state of event data” and [quite old as this field goes] “Seven guidelines for generating data using automated coding“. 

16. It’s funny how many people will question why one writes when there is no prospect of financial reward when I’ve never heard someone exclaim to a golfer: “What, you play golf for free?? And you even have to pay places to let you play golf? And spend hours and hours doing it? Why, that’s so stupid: Arnold Palmer, Jack Nicklaus, and Tiger Woods made millions playing golf! If you can’t, just stop trying!”

17. As distinct from Beyond Democracy, the fiction, and a still-to-jell work on contemporary Western Buddhism—like the world needs yet another book by a Boomer on Buddhism?—all of which are intended as books. Someday…maybe…but you know…

18. Like the Economist Espresso‘s quote of the day: “A person is a fool to become a writer. His [sic] only compensation is absolute freedom.” Roald Dahl (Charlie and the Chocolate Factory, Matilda, The Fantastic Mr. Fox). Yep.

Posted in Uncategorized | Leave a comment

Happy 60th Birthday, DARPA: you’re doomed

Today marks the mid-point of a massive self-congratulatory 60th anniversary celebration by DARPA [1]. So, DARPA, happy birthday! And many happy returns!! YEA!!!

That’s a joke, right? Why yes, how did you guess?

A 60th anniversary, of course, is very important landmark, but not in a good way: Chinese folklore says that neither fortune nor misfortune persist for more than three generations,[2] and the 14th century historian and political theorist Ibn Khaldun pegged three generations as the time it took a dynasty to go from triumph to decay. Calculating a human generation as 20 years and, gulp, that makes 60.

Vignette #1:

DARPA, perhaps aware of some of the issues I will be raising here, has embarked on some programs with “simplified” proposal processes (e.g. this https://www.darpa.mil/news-events/2018-07-20a). In DARPA-speak, “simplified” means a 20 to 30 page program description with at least 7 required file templates, the first being an obligatory PowerPoint™ slide. In industry-speak, this is referred to as “seven friggin’ PDF files and WTF a friggin’ required PowerPoint™ slide??—in 2018 who TF uses friggin’ PowerPoint™???” [3]

Vignette #2:

A few months back, I’d been alerted to an interesting DARPA DSO BAA under the aforementioned program, and concocted an approach involving another Charlottesville-based tech outfit (well, their CTO is in CVille: the company is 100% remote on technical work, across a number of countries) with access to vast amounts of relevant data. The CTO and I had lunch on a Friday—during which I learned the company had developed out of an earlier DARPA-funded project—and he was all ready to move ahead with this.

On Monday the project was dead, vetoed by their CFO: they have plenty of work to do already, and it is simply too expensive to work with DARPA as DARPA involves an entirely different set of contracting and collaboration norms than the rest of the industry. Sad.

Arlington, we have a problem.

But before we go any further, I already know what y’all are thinking: “Hey, Schrodt, so things have finally caught up with your obnoxious little feral strategy, eh? Left academia, no longer have access to an Office of Sponsored Research [5][6] so you can’t apply for DARPA funding any more. Nah, nah, nah! LOSER! LOSER!! LOSER!!!

Well, yeah, elements of that: per vignette #2, there are definitely DARPA [7] programs I’d like to be participating in, but no longer can, or rather, cannot assemble any conceivable rationale for attempting. Having sketched out this diatribe [8], I was on the verge of abandoning it as mere sour grapes when The Economist [1 September 2018] arrived with a cover story based on almost precisely the same complex social systems argument I’d already outlined for DARPA, albeit about Silicon Valley generally. So maybe I’m on to something. Thus we will continue.

As I was reminded at a recent workshop, DARPA was inspired by the scientific/engineering crisis of Sputnik. [9] DARPA’s challenge in the 21st century, however, is that it continues to presuppose the corporate laboratory structures of the Sputnik era, where business requirements and incentives were [almost] completely reversed from what they are today: the days of the technical supremacy of Bell Labs and Xerox PARC are gone, and they aren’t coming back. [10]

As The Economist points out in the context of the demise of Silicon Valley as an attractive geographical destination, Silicon Valley’s very technological advances—many originally funded by DARPA—have sown the seeds of its geographical destruction. DARPA faces bureaucratic rather than geographical challenges, but is essentially in the same situation at least in the world of artificial intelligence/machine learning/data science (AI/ML/DS) where DARPA appears to be desperately trying to play catch-up.

A few of the insurmountable social/economic changes DARPA is facing:

  • AI/ML/DS innovations can be implemented almost instantly with essentially no capital investment.[11] As The Economist [25 August 2018] notes, in 1975 only 17% of the value of the S&P 500 companies was in intangibles; by 2015 this was 84%.
  • The bifurcation/concentration of the economy, particularly in technical areas: the rate of start-ups has slowed, and those that exist quickly get snatched up by the monsters. Consider for example the evolution of the Borg-like SAIC/Leidos [12], which first gobbled up hundreds of once-independent defense consulting firms, then split, and now Leidos is getting purchased by Lockheed. You will be assimilated!
  • As some recent well-publicized instances have demonstrated, working with DARPA—or the defense/intelligence community more generally—will be actively opposed by some not insignificant fraction of the all-too-mobile employees of the technology behemoths. Good luck changing that.

As I’ve documented in quite an assortment of posts in this blog—I’ve been successfully walking this particular walk for more than five years now—these changes have led an an accelerating rise, particularly in the AI/ML/DS field, of the independent remote contractor—either an individual or a small self-managing team—due to at least five factors

  • Ubiquity of open source software which has zero monetary cost of entry and provides a standard platform across potential clients.
  • Cloud computing resources which can be purchased and cast aside in a microsecond with no more infrastructure than a credit card.
  • StackOverflow and GitHub putting the answers to almost any technical question a few keystrokes away: the relative advantage of having access to local and/or internal company expertise has diminished markedly.
  • A variety of web-based collaborative environments such as free audio and video conferencing, shared document environments, and collaboration-oriented communication platforms such as Slack, DropBox and the like.
  • Legitimation of the “gig economy” from both the demand and supply side: freelancers are vastly less expensive to hire and are now viewed as entrepreneurial trailblazers rather than as losers who can’t get real jobs. In fact, because of its autonomy, remote work is now considered highly desirable.

The upshot, as explosion.ai’s (spaCy, prodigy) Ines Montani explains in a recent EuroPython talk, small companies are now fully capable of doing what only massive companies could do a decade or so ago. Except, of course, dealing with seven friggin’ PDF files including a required friggin’ PowerPoint™ slide to even bid on a project with some indeterminate chance of being funded following a six to nine month delay. More shit sandwiches?: oh, so sorry, just pass the plate as I’ve already had my share.

As those who follow my blog are aware, I spend my days in a pleasant little office in a converted Victorian three blocks from the Charlottesville, Virginia pedestrian mall [13] in the foothills of the Blue Ridge, uninterrupted except by the occasional teleconference. I have nearly complete control of my tasks and my time, and as an introvert whose work requires a high level of concentration, this is heaven. My indirect costs are around 15%. In the five years I’ve supported myself in this fashion, my agreements with clients typically involve a few conversations, a one or two page SOW, and then we get to work.  

DARPA-compatible alternatives to this sort of remote work, of course, would involve transitioning to some open-office-plan hellhole beset with constant interruptions and “supervision” by clueless middle-managers who spend their days calling meetings and writing corporate mission statements because, well, that’s just what clueless middle managers are paid to do.[14] These work environments are horribly soul-sapping and inefficient—with indirect costs far exceeding mine—except for that rather sizable proportion of the employees who are in fact not adding any value to the enterprise but are enjoying an indefinitely extended adolescent experience where, with any luck at all, they can continue terrorizing the introverts who actually are writing quality code, just as they did in junior high school, which is pretty much what open-office-plan hellholes try to replicate. I digress.

So, I suppose, indeed I am irritated because there are opportunities out there I can’t even compete for without radically downgrading my situation, and even though I, and the contemporary independent contractor community more generally, could probably do these tasks at lower cost and higher quality than is being done by the corporate behemoths who will invarably end up with all that money, this despite the fact that a migration to remote teams with lower costs and higher output is precisely what we are seeing in the commercial sector. Says no less than The Economist.

Okay, okay, so the FAANG are leary about even talking to DARPA, and we’ve already established that the existing contractors aren’t giving DARPA what it is looking for [15], but you’ve still got academic computer science to fall back on, right? Right?

Uh, not really.

Once again, any reliance on academia has DARPA doing the time warp again and heading back to the glory days of the Sputnik crisis when, in fact, academic research was probably a pretty good deal. But now:

  • Tuition—which will be covered directly or indirectly—at all research universities has soared as the public funding readily available in the 1950s has collapsed.
  • Universities no longer have the newest and shiniest toys: those are in the private sector.
  • The best and brightest students zip through their graduate programs in record time, with FOMO private sector opportunities nipping at their tails. The ones who stick around…well, you decide.
  • The best and brightest professors have far more to gain from their startups and consultancies than from filling out seven friggin’ PDF files including one friggin’ required PowerPoint™ slide. Those with no such prospects, and the people building empires for the sake of empire building and/or aspirations to become deans, associate deans, assistant deans, deanlings or deanlets, yeah, you might get some of those. Congrats. Or something.

And these are impediments before we consider the highly dysfunctional publication incentives which have reduced academic computer science to only a single true challenge, the academic Turing Test—probably passed several years ago but the reality of this still hidden—for who will be the first to write a bot which can successfully generate unlimited publishable AI/ML/DS papers.[16] This and the fact that computer science graduate students tend to be like small birds, spending most of their time flitting around in pursuit of novelties in the form of software packages and frameworks with lifespans comparable to that of a dime-store goldfish. And all graduate students, on entering even the most lowly M.S.-level program, are sworn to a dark oath, enforced with the thoroughness of a Mafia pizza parlor protection racket, to never, ever, under any circumstances, comment, document or test their code. [21]

Academia will not save you. No one will save you.

HARUMPHHH!!! So if you bad-attitude remote contractors are so damn smart and so damn efficient, there’s an obvious market failure/arbitrage opportunity here which will self-correct because, as we all know, markets always work perfectly.

Well, maybe, but I’d suggest this is going to be tough, for at least three reasons.

The first issue, for the remote independent contractors as well as the FAANG, is simply “why bother?”: I’m not seeing a whole lot of press about AI/ML/DS unemployment, and if you can get work with a couple of phone calls and one-page SOW, why deal with seven friggin’ PDF forms and a friggin’ required PowerPoint™ slide?

Then there’s the unpleasant fact that anyone attempting to arbitrage the inefficiencies here is wandering into the arena with the likes of SAIC, Lockheed and BBN, massive established players more than happy to destroy you, and they consider seven friggin’ PDFs and all other barriers to entry a feature rather than a bug, as well as deploying legions of Armani-shod lobbyists to make damn sure things stay that way. But mostly, they’ll come after any threats to their lucrative niche faster than a rattlesnake chasing a pocket gopher. I suppose it could be done, but is not for the light of heart. Or bank account.

The final issue is that because DARPA [famously, and probably apocryphally] expects its projects to fail 80% of the time, there’s a frog-in-boiling-water aspect where DARPA won’t notice—until things are too late—structural problems which now cause projects to fail where they would have succeeded in the absence of those new conditions.  Well, until the Chinese get there first.[17]

There is, in the end, a [delightful?] irony here: one of the four foci within the DARPA Defense Sciences Office, those folks whose idea of “simple” is a 30 page BAA and seven friggin’ PDF files starting with a friggin’ obligatory PowerPoint™ slide, is called “complex social systems,” which in most definitions would include self-modifying systems.[18] And a second of those foci deals with “anticipating surprise.”

Well, buckeroos, you’ve got both of these phenomena going on right there in the sweet little River City of Arlington, VA: a complex self-modifying system that’s dropped a big surprise and in all likelihood there’s nothing you can do about.

Okay, maybe a tad too dramatic: at the most basic level, all that is going on here is a case of the well-understood phenomenon of disruptive innovation—please note my clever use of a link to that leftist-hippy-commy rag, the Harvard Business Reviewwhere new technologies enable the development of an alternative to the established/entrenched order which is typically in the initial stages not in fact “better” than the prevailing technology, but attains a foothold by being faster, cheaper and/or easier to use, thus appealing to those who don’t actually need “the best.”

Project proposals provided by remote independent contractors with 15% IDC will—assuming they even try—be inferior to those of the entrenched contractors with 60% IDC, since in addition to employing legions of Armani-shod lobbyists they also employ platoons of PowerPoint™ artistes, echelons of document editors, managers overseeing countless layers of internal reviews, and probably the occasional partridge in a pear tree.[19] You want a great proposal?: wow can these folks ever produce a great proposal!

They just can’t deliver on the final product [20] for the reasons noted above. Leaving us in this situation

What they propose

What they deliver

 

 

 

 

 


In contrast, the coin of the realm for the independent shops is their open code on GitHub: even if the contracted work will be proprietary, you’ve got to have code out where people can look at it, and that’s why contemporary companies are comfortable hiring people they’ve never met in person—and may never meet—and who will be working eight time zones away: it’s the difference between hiring someone to remodel your kitchen based on the number of glossy architectural magazines they bring to a meeting versus hiring them based on other kitchens they’ve remodeled. All of which is to say that in the contemporary AI/ML/DS environment, assembling an effective team is more Ocean’s 11 or Rogue One, much less The Office.

So on a marginally optimistic note, I’ll modify my title slightly: DARPA, until you find a structure that rewards people for writing solid code, not PowerPoint™ slides, you’re doomed.

Happy 60th.

Footnotes

1. If you don’t know what DARPA is, stop right here as the cultural nuances permeating the remainder of this diatribe will make absolutely no sense. I’m also obnoxiously refraining from defining acronyms such as DSO, BAA, SOW, PM, FAR, FOMO, FAANG, CFO, CTO, ACLED, ICEWS, F1, AUC, MAE, and IARPA because refraining from defining acronyms is like so totally intrinsic to this world.

2. This is apparently an actual Chinese proverb, though it is typically rendered as Wealth does not pass three generations” along with many variants on the same theme. There’s a nice exposition, including an appropriate reference to Ibn Khaldun, to be found, of all places, on this martial arts site.

3. A couple weeks ago the social media in CVille—ya gotta love this place—got into an extended tiff over whether the use of the word “fuck”—or more generally “FUCK!” or “FUCK!!” or “THAT’S TOTALLY FUCKING FUCKED, YOU FUCKING FUCKWIT!!!”—was or was not a form of cultural appropriation. Of course, it’s not entirely clear what “culture” is being appropriated, and thus offended, as the word has been in common use for centuries, but presumably something local as the latter phrase is pretty much representative of contemporary public discourse, such as it is, in our fair city.[4] Okay, not quite. Fuckwit.  So to avoid offense—not from the repetitive use of an obscenity, but the possibility of that indeterminate variant on cultural appropriation—I will continue to refer to the “seven friggin’ PDFs and one friggin’ required PowerPoint™ slide.”

4. Browsing the Politics and Prose bookstore in DC last weekend—this at the new DC Wharf, where the hoi polloi can gaze upon the yachts of the lobbyists for DARPA contractors—I noticed that if you would like to write a book, but really don’t have anything to say, adding FUCK to the title is a popular contemporary marketing approach. Unfortunately, these tomes—mind you, they are typically exceedingly short, so perhaps technically they are not really “tomes”—will probably all be pulped or repurposed as mulch, but should a few escape we can foresee archeologists of the future—probably sentient cockroaches—using telepathic powers to record in [cockroach-DARPA-funded] holographic crystals “Today our excavation reached the well-documented FUCK-titled-tome layer, reliably dated to 2015-2020 CE.” Though they will more likely have to be content with the “Keurig capsule layer”, which far less precisely locates accompanying artifacts only to 1995-2025 CE.

5. Or as Mike Ward eloquently puts it: OSP == Office for the Suppression of Research.

6. As Schrodt puts it, the OSP mascot is the tapeworm.

7. And IARPA: same set of issues, less money.

8. Though inspired in part after listening to some folks at the 2018 summer Society for Political Methodology meetings—unlike the three-quarters of political science Ph.D.s who will not find tenure track positions, political methodologists are eminently employable, albeit not necessarily in academia—literally laughing out loud—and this conference being in Provo, Utah, laughing out loud while stone-cold sober—about Dept of Defense attempts to recruit high-quality collaborators in the AI/ML/DS fields.

9. In this presentation, we were told “I’m sure no one here remembers Sputnik.” Dude, I not only remember Sputnik—vividly—I can even remember when the typical Republican thought the Russians were an insidious and unscrupulous enemy!

10. From the recent obituary of game theorist Martin Shubik

After earning his doctorate at Princeton, he worked as a consultant for General Electric and for IBM, whose thinking about research scientists he later described to The New York Times: “Well, these are like giant pandas in a zoo. You don’t really quite know what a giant panda is, but you sure as hell know (1) you paid a lot of money for it, and (2) other people want it; therefore it is valuable and therefore it’s got to be well fed.

11. Capital intensity is a key caveat here: as the price of the shiniest new toys increases, so does the competitiveness of DARPA compared to the commercial sector. So, for example, in areas such as quantum computing, nanotechnologies and most work on sensors, DARPA will do just fine. AI/ML/DS: not so much. So despite my dramatic title—hey, it’s a blog!—DARPA is probably not doomed in endeavors involving bashing metals or molecules. 

12. I wasn’t really sure how to find that Vanity Fair article on SAIC—which got quite the attention when it first came out more than a decade ago—but it popped right up when I entered the search term “SAIC is evil”. Also see this.

The sordid history of the likes of SAIC and Lockheed raises the topic/straw-man of whether DARPA PMs, in comparison to private sector research managers who can contract with a few phone calls and a short SOW, “must” be hemmed in by mountains of FARs and bureaucracy lest they be irresponsible with funds from the public purse. Yet these same managers routinely are expected—all but required thanks to the legions of Armani-shod lobbyists—to dole out billions to outfits like SAIC and Lockheed which have long—like really, really long—rap sheets on totally wasting public moneys. Sad.

13. Six coffee shops and counting.

14. Okay, your typical tech middle manager is also paid to knock back vodka shots in strip clubs while exchanging pointers on how to evade HR’s efforts to reduce sexual harassment, a phenomenon I have explored in greater detail here.

15. See Sharon Weinberger’s superbly researched history of DARPA, Imagineers of War, for further discussion, particularly her analysis of DARPA’s seemingly terminal intellectual and technical drift in the post-Cold-War period.

16: Academic computer science has basically run itself into a publications cul-de-sac—mind you, possibly quite deliberately, as said cul-de-sac guarantees their faculty can spend virtually all of their time working on their start-ups and consultancies—where publication has become defined solely by the ability to get some marginal increase in standardized metrics on standardized data sets.

Vignette: I’m generally avoiding reviewing journal articles now—I have only limited access to paywalled journals, and in any case don’t want to encourage that sort of thing—but a few weeks ago finally agreed to do so (for a proprietary journal I’d never heard of) after being incessantly harangued by an editor, presumably because I was one of about five people in the world who had worked in the past with all of the technologies used in the paper, and I decided to reward the effort that must have been involved to establish this connection. The domain, of course, was forecasting political conflict, and the authors had assembled a conflict time series from the usual suspects—ACLED, Cline Center, or ICEWS—and applied four different ML methods, which produced modestly decent results—as computer scientists, they felt no obligation, of course, to look at the existing literature which extends back a mere four or five decades—with a bit of variation in the usual metrics, probably F1, AUC and MAE. There was a serious discussion of these differences, discussions of the relative level of resources required for each estimator, blah blah blah. So far, a typical ML paper.

Until I got to a graphical display of the results. The conflict time series, of course, was a complex saw-toothed sequence. Every single one of the ML “models”: a constant! THE [fuckwit] IDIOTS HAD SUBMITTED A PAPER WHERE THE PREDICTED TIME SERIES HAD ZERO VARIANCE! And those various estimators didn’t even converge to the mean, hence the differences in the measures of fit!

I politely told the editor, in all sincerity, that this was the stupidest thing I had ever read in my life, and in political science it would have never gone out for review. The somewhat apologetic response allowed that it might not be the finest contribution from the field, as the journal was new (and, I’m sure, expensive: gotta make the percentage of library budgets that go to serials asymptotic to 100%!) and was being submitted for a special issue. Right.

After completing the review, I tracked down the piece (I follow the political science norm of respecting double-blind review processes): it was from one of the top computer science shops in the country, and the final “author” (who I presume had never even glanced at the piece) was the director of a research institute with vast levels of government funding. Such is the degeneracy of contemporary academic computer science. I’m hardly the only person to notice this issue: see this from Science.

17. This, of course, being the dominant issue in political-economy for the first half of the 21st century: the Chinese have created a highly economically successful competitor to liberal market polities, and we have also seen a convergence in market concentration in the new companies dominating the heights of markets in both systems. However, we’ve got 200 years of theorizing—once called “conservative” (and before that “liberal”) in the era before “conservative” became equated with following the constantly changing whims of a deranged maniac—arguing that decentralized economic political-economic systems should provide long-term advantages over authoritarian systems. But that sure the heck isn’t clear at the moment.

18: Hegel , of course, had similar ideas 200 years ago, but wasn’t very good with PowerPoint™: sad.

19. Do these elaborate proposal preparation shops figure into the high indirect costs of the established contractors? Nah…of course not, because we know proposals are done by legions of proposal fairies who subsist purely on dewdrops and sunlight, costing nothing. Or if they did, those costs would be reimbursed by the legions of proposal leprechauns and their endless supplies of gold. None of this ever figures into indirect cost rates, right?

20. As distinct from providing 200+PowerPoint™ slide decks for day-long monthly program reviews: they’ll be great on that as well!

21. Turns out astrophysics has a wonderful name for the undocumented code people write figuring no one will ever look at it only to find it’s still in use twenty years later: “dinosource.”

Posted in Higher Education, Methodology | 1 Comment

What if a few grad programs were run for the benefit of the graduate students?

I’ve got a note in my calendar around the beginning of August—I was presumably in a really bad mood at [at least] some point over the past year—to retweet a link to my blog post discussing my fondness for math camps—not!—but in the hazy-lazy-crazy days of summer, I’m realizing this would be rather like sending Donald Trump to meet with the leaders of U.S. allies: gratuitously cruel and largely meaningless. Instead, and more productively, an article in Science brought to my attention a recent report [1] by the U.S. National Academies of Sciences, Engineering, and Medicine (NASEM)—these people, please note, are a really big deal. The title of the article—”Student-centered, modernized graduate STEM education”—provides the gist but here’s a bit more detail from the summary of the report provided in the Science article:

[the report] lays out a vision of an ideal modern graduate education in any STEM field and a comprehensive plan to achieve that vision. The report emphasizes core competencies that all students should acquire, a rebalancing of incentives to better reward faculty teaching and mentoring of students, increased empowerment of graduate students, and the need for the system to better monitor and adapt to changing conditions over time.  … [in most institutions] graduate students are still too often seen as being primarily sources of inexpensive skilled labor for teaching undergraduates and for performing research. …  [and while] most students now pursue nonacademic careers, many institutions train them, basically, in the same way that they have for 100 years, to become academic researchers

Wow: reconfigure graduate programs not only for the 21st century but to benefit the students rather than the institutions. What…a…concept!

At this point my readership now splits, those who have never been graduate students (a fairly small minority, I’m guessing) saying “What?!? Do you mean graduate programs aren’t run for the benefit of their students???” while everyone who has done time in graduate school is rolling their eyes and cynically saying “Yeah, right…” With the remainder rolling on the ground in uncontrollable hysterical laughter.[2]

But purely for the sake of argument, and because these are the lazy-hazy-crazy days of summer, and PolMeth is this week and I got my [application-focused!] paper finished on Friday (!!), let’s just play this out for a bit, at least as it applies to political methodology, the NAESM report being focused on STEM, and political methodology is most decidedly STEM. And in particular, given the continued abysmal—and worsening [3]—record for placement into tenure-track jobs in political science, let’s speculate for a bit what a teaching-centered graduate level program for methodologists, a.k.a. data scientists, intending to work outside of academia might look like. For once, I will return to my old framework of seven primary points:

1. It will basically look like a political methodology program

I wrote extensively on this topic about a year ago, taking as my starting point that experience in analyzing the heterogeneous and thoroughly sucky sorts of data quantitative political scientists routinely confront is absolutely ideal training for private sector “data science.” The only new observation I’d add, having sat through demonstrations of several absolutely horrible data “dashboards” in recent months, is formal training in UX—user interface/experience—in addition to the data visualization component. So while allowing some specialization, we’d basically want a program evenly split between the four skill domains of a data scientist:

  • computer programming and data wrangling
  • statistics
  • machine learning
  • data visualization and UX

2. Sophisticated problem-based approaches taught by instructors fully committed to teaching

One of the reasons I decided to leave academia was my increasing exposure to really good teaching methodologies combined with a realization that I had neither the time, energy, nor inclination to use these. “Sage on the stage” doesn’t cut it anymore, particularly in STEM.

Indeed, I’m too decrepit to do this sort of thing—leave me alone and just let me code (and, well, blog: I see from WordPress this is published blog post #50!)—but there are plenty of people who can enthusiastically do it and do it very well. The problem, as the NASEM report notes in some detail, is that in most graduate programs there are few if any rewards for doing so. But that’s an institutional issue, not an issue of the total lack of humans capable of doing the task, nor the absence of a reasonably decent body of research and best-practices—if periodically susceptible, like most everything social, to fads—on how to do it.

3. Real world problems solved using remote teaming

Toy problems and standardized data sets are fine for [some] instruction and [some] incremental journal publications, but if you want training applicable to the private sector, you need to be working with raw data that is [mostly] complete crap, digital offal requiring hours of tedious prep work before you can start applying glitzy new methods to it. Because that, buckeroos, is what data science in the private sector involves itself with, and that’s what pays the bills. Complete crap is, however, fairly difficult to simulate, so much better to find some real problems where you’ve got access to the raw data: associations with companies—the sorts of arrangements that are routine in engineering programs—will presumably help here, and as I’ve noted before, “data science” is really a form of engineering, not science. 

My relatively new suggestion is for these programs to establish links so that problem-solving can be done in teams working remotely. Attractive as the graduate student bullpen experience may be, it isn’t available once you leave a graduate program, and increasingly, it will not be duplicated in many of the best jobs that are available, as these are now done using temporary geographically decentralized teams. So get students accustomed to working with individuals they’ve never met in person who are a thousand or eight thousand or twelve thousand miles away and have funny accents and the video conferencing doesn’t always work but who nonetheless can be really effective partners. In the absence of some dramatic change in the economics and culture of data science, the future is going to look like the “fully-distributed team” approach of parse.ly , not the corporate headquarters gigantism of FAANG.

4. One or two courses on basic business skills

I’ve written a number of blog entries on the basics of self-employment—see here and here  and here—and for more information, read everything Paul Graham has ever written, and more prosaically, my neighbor and tech recruiter Ron Duplain always has a lot of smart stuff to say, but I’ll briefly reiterate a couple of core points here.

[Update 31 July: Also see the very useful EuroPython presentation from Ines Montani of explosion.ai, the great folks that brought you spaCy and prodigy. [9]]

Outside of MBA programs—which of course go to the opposite extreme—academic programs tend to treat anything related to business—beyond, of course, reconfiguring their curricula to satisfy the funding agendas of right-wing billionaires—as suspect at best and more generally utterly worthy of contempt. Practical knowledge of business methods also varies widely within academia: while the stereotype of the academic coddled by a dissertation-to-retirement bureaucracy handling their every need is undoubtedly true as the median case, I’ve known more than a few academics who are, effectively, running companies—they generally call them “labs”—of sometimes quite significant size.

You can pick up relevant business training—well, sort of—from selectively reading books and magazine articles but, as with computer programming, I suspect there are advantages to doing this systematically [and some of my friends who are accountants would definitely prefer if more people learned business methods more systematically]. And my pet peeve, of course, is getting people away from the expectations of the pervasive “start-up porn”: if you are reasonably sane, your objective should be not to create a “unicorn” but rather a stable and sustainable business (or set of business relationships) where you are compensated at roughly the level of your marginal economic contribution to the enterprise.[4]

That said, the business angle in data analytics is at present a rapidly moving target as the the transition to the predominance of remote work—or if you prefer, “gig-economy”—plays out. In the past couple of weeks, there were articles on this transition in both The Economist’s “The World If…” feature and Science magazine’s “Science Careers” [6 July 2018][5]. But as The Economist makes clear, we’re not there yet, and things could play out in a number of different ways.[6] Still, it is likely that most people in the software development and data analytics fields should probably at least plan for the contingency they will not be spending their careers as coddled corporate drones and instead will find themselves in one of those “you only eat what you—or you and your ten-person foraging party of equals—kill” environments. Where some of us thrive. Grrrrrrrr.  There are probably some great market niches for programs that can figure out what needs to be covered here and how to effectively teach it. 

5. Publication only in open-access, contemporaneous venues

Not paywalled journals. Particularly not paywalled journals with three to five year publication lags. As I note in one of several snarky asides in my PolMeth XXXV paper

Paywalled journals are virtually inaccessible outside universities so by publishing in these venues you might as well be burying your intellectual efforts beneath a glowing pile of nuclear waste somewhere in Antarctica. [italics in original]

Ideally, if a few of these student-centered programs get going, some university-sponsored open access servers could be established to get around the current proliferation of bogus open access sites: this is certainly going to happen sooner or later, so let’s try “sooner.” Bonus points: such papers can only be written using materials available from open access sources, since the minute you lose your university computer account, that’s the world you will live in.

It goes without saying that students in these programs should establish a track record of both individual and collective code on GitHub. GitHub (and StackOverflow) having already solved the open access collective action problem in the software domain.[7] 

6. Yes, you can still use these students as GTAs and GRAs provided you compensate them fairly

Okay, I was in academia long enough to understand the basic business model of generating large amounts of tuition credit hours—typically about half—in massive introductory classes staffed largely by graduate students. I was also in academia long enough to know that graduate training is not required for students to be able to competently handle that material: You just need smart people (the material, remember, is introductory) and, ideally, some training and supervision/feedback on teaching methods. To the extent that student-centered graduate programs have at least some faculty strongly committed to teaching rather than increasing the revenues of predatory publishers you may find MA-level students are actually better GTAs than research-oriented PhD students.

As far providing GRAs, my guess is that generating basic research—open access, please—out of such programs will also occur naturally and again, with because the programs have a focus on applications these students may prove better (or at least, less distracted) than those focused on the desperate—and in political science, for three-quarters, inevitably futile—quest for a tenure-track position. You might even be able to get them to document their code!

In either role, however, please provide those students with full tuition, a living wage and decent benefits, eh? The first law of parasitism being, of course, “don’t kill the host.” If that doesn’t scare you, perhaps the law of karma will.

7. Open, transparent, unambiguous, and externally audited outcomes assessments

Face it, consumers have more reliable information on the contents of a $1.48 can of cat food than they have on the outcomes of $100,000 business and law school programs, and the information on professional programs is usually far better than the information on almost all graduate programs in the social sciences. In a student-centered program, that has to change, lest we find, well, programs oriented towards training for jobs that only a quarter of their graduates have any chance of getting.

In addition to figuring out standards and establishing record-keeping norms, making such information available is going to require quite the sea change in attitudes, and thus far deans, associate deans, assistant deans, deanlets, and deanlings have successfully resisted open accountability by using their cartel powers.[8] In an ideal world, however, one would think that market mechanisms would favor a set of programs with transparent and reliable accountability.

Well, a guy can dream, eh?

See y’all—well, some subset of y’all—in Provo.

Footnotes

1. Paywalled, of course. Because elite not-for-profit organizations sustained almost entirely by a combination of tax monies and grants from sources who are themselves tax-exempt couldn’t possibly be expected to make their work accessible, particularly since the marginal cost of doing so is roughly zero.

2. What’s that old joke from the experimental sciences?: if you’re embarking on some procedure with potentially painful consequences, better to use graduate students rather than laboratory rats because people are less likely to be emotionally attached to graduate students.

3. The record for tenure track placement has gotten even worse, down to 26.3%, which the APSA notes “is the lowest reported figure since systematic observation began in the 2009-2010 academic year.” 

4. Or if you want to try for the unicorn startup—which is to say, you are a white male from one of a half-dozen elite universities—you at least understand what you are getting into, along with the probabilities of success—which make the odds of a tenure-track job in political science look golden in comparison—and the actual consequences, in particular the tax consequences, of failure. If you are not a white male from one of a half-dozen elite universities, don’t even think about it.

5. Science would do well to hire a few remote workers to get their web page functioning again, as I’m finding it all but inoperable at the moment. Science is probably spending a bit too much of their efforts breathlessly documenting a project which using a mere 1000 co-authors has detected a single 4-billion-year-old neutrino.

6. And for what it’s worth, this is a place where Brett Kavanaugh could be writing a lot of important opinions. Like maybe decisions which result in throwing out the vast kruft of gratuitous licensing requirements that have accumulated—disproportionately in GOP-controlled states—solely for the benefit of generally bogus occupational schools.

7. And recently received a mere $7.5-billion from Microsoft for their troubles: damn hippies and open source, never’ll amount to anything!

8. Though speaking of cartels—and graduate higher education puts OPEC, though not the American Medical Association, to shame on this dimension—the whole point of a cartel is to restrict supply. So a properly functioning cartel should not find itself in a position of over-producing by a factor of three (2015-2016 APSA placements) or four (2016-2017 placements). Oh…principal-agent problems…yeah, that…never mind…

9. Watch the presentation, but for a quick summary, her main point is that the increasingly popular notion that a successful company has to be large, loss-making, and massively funded is bullshit: if you actually know what you are doing, and are producing something people want to buy, you can be self-financing and profitable pretty much from the get-go. “Winner-take-all” markets are only a small part of the available opportunities—though you wouldn’t know that from the emphasis on network effects and FOMO in start-up porn, now amplified by the suckers [10] who pursue the opportunities in data science tournaments rather than the discipline of real markets—and there are plenty of possibilities out there for small, complementary teams who create well-designed, right-sized software for markets they understand. Thanks to Andy Halterman for the pointer.

10. Okay, “suckers” is probably too strong a word: more likely these are mostly people—okay, bros—who already have the luxury of an elite background and an ample safety net provided by daddy’s and mommy’s upper 1% income and social networks so they can afford to blow off a couple years doing tournaments just for the experience. But compare, e.g. to Steve Wozniak and Steven Jobs—and to a large extent, even with their top-1% backgrounds, Bill Gates and Paul Allen—who created things people actually wanted to buy, not just burning through billions to manipulate markets (Uber, and increasingly it appears, Tesla)

Posted in Higher Education, Methodology | Leave a comment

Witnessing a paradigm shift?

The philosopher of science Thomas Kuhn is famous—beyond an apparent penchant for throwing ashtrays [1]—for his vastly over-generalized concept of “paradigm shifts” in scientific understanding, where a set of ideas once thought unreasonable becomes the norm, exchanging this status with ideas on the same topic once almost universally accepted. [2] This typically involves a generational change—Max Planck famously observed that scientific progress occurs one funeral at a time —but can sometimes occur more quickly. And I think I’m watching one develop in the field of predictive models of conflict behavior.

The context here [3] was a recent workshop I attended in Europe on that topic. The details don’t matter but suffice it to say this involved an even mix of the usual suspects in quantitative conflict data and modeling—I’m realizing there are perhaps fifty of us in the world—and an assortment of NGOs and IGOs, mostly consumers of the information. [4]  Held amid the monumental-brutalist architecture housing the pan-European bureaucracy, presumably the model for the imperial capital in The Hunger Games, leading one to sympathize, at least to a degree, with European populist movements. And by the way, in two days of discussions no one mentioned Donald Orange-mop even once: we’re past that.

The promised paradigm change is on the issue of whether technical models for forecasting conflict are even possible—and as I’ve argued vociferously in the past, academic political science completely missed the boat on this—and it looks as though we’ve suddenly gone from “that’s impossible!” to “okay, where’s the model, and how can we improve it?” This new assessment being entirely due to the popularization over the past year of machine learning. The change, even taking into account that the Political Instability Task Force has been doing just this sort of thing, and doing it well, for at least fifteen years, has been stunningly rapid.

Not, of course, without more than a few bumps along the way. Per the persistent hype around “deep learning,” there’s a strong assumption that “artificial intelligence” is now best done with neural networks—and the more complex the better—whereas there’s consistent evidence both from this workshop and a number of earlier efforts I’m familiar with that because of the heterogeneity of the cases and the tiny number of positives, random forests are substantially better. There’s also an assumption that you can’t figure out which variables are important in a machine learning model: again, wrong, as this is routine in random forests and can be done to a degree even in neural nets, though it’s rather computationally intensive. One presenter—who had clearly consumed a bit too much of the Tensorflow Kool-Aid—noted these systems “learn on their own”: alas, that’s not true for this problem [6] and in fact we need lots of training cases, and in conflict forecasting models the aforementioned heterogeneity and rare positives still hugely complicate estimation.

So these models are not easy, but they are now considered possible, and there is an actual emerging paradigm: In the course of an hour I saw presentations by a PhD student in a joint program at Universities of Stockholm and Iceland developing a resource-focused conflict forecasting model and a data scientist from the World Bank and FAO working on famine forecasting [7] both implementing essentially the same very complex protocols for training, calibration, and cross-validation of various machine learning models. [8][15]

Well, we live in interesting times.

There’s a fairly standard rule-of-thumb in economic history stating it takes between one and two human generations—20 to 40 years—to effectively incorporate a major new technology into the production structure of organizations. The—yes, paradigmatic—cases are the steam engine, electricity, and computers. [9] I’ve sensed for quite some time that we’re in this situation, perhaps half-way through the process, with respect to technical forecasting models and foreign policy decision-making. [10] As Tetlock and others have copiously demonstrated, the accuracy of human assessments in this field is very low, and as Kahneman and others have copiously demonstrated, decision-making on high-risk, low-probability issues is subject to systematic biases. Until quite recently, however, data [11] and computational constraints meant there were no better alternatives. But there are now, so the issue is how to properly use this information. 

And not every new technology takes a generation before it is adopted: to take some examples most readers will be familiar with, word-processing, MP3 music files, flat-screen displays, and cell phones displaced their earlier rivals almost in a historical eye-blink, albeit except for word processing this was largely in a personal rather than organizational context. In the long-ago research phase of ICEWS—a full ten years ago now, wow…—I had a clever slide (well, I thought it was clever) showing a robot saying “We bomb Mindanao in six hours” and a medal-bedecked general responding “Yes, master” to illustrate what technical forecasting models are not designed to do. But with accuracy 20% to 30% better than human forecasts, one would think these approaches should have some impact on the process. That is going to take time and effort to figure that out, particularly since human egos and status are involved, and the models will make mistakes. And present a new set of challenges, just as electrical power presents a different sets of risks and opportunities than the steam and water power it replaced. But their eventual incorporation into policy-making seems inevitable.

Finally, this might have implications for the future demand for event data, as models customized for very specific organizational needs finally provide a “killer app” using event data as a critical input. As it happens, no one has yet to come up with something that does the job of event data—recording day to day interactions of political actors as reported in the open press—without simply looking pretty much like plain old event data: Both the CAMEO and PLOVER [12] event coding systems still have the basic structure of the 60-year-old WEIS, because WEIS corporates most things in the news of interest to analysts (and their quantitative models). While the forecasting models I’m currently seeing primarily use annual (and state-level) structural data, as soon as one drops to the sub-annual level (and, increasingly, sub-state, as geocoding of event data improves) event data are really the only game in town. [13]

Footnotes

1. Recently back in the news…well, sort of…thanks to a thoroughly unflattering book by documentary film-maker Errol Morris, whose encounters with Kuhn when Morris was a graduate student left a traumatic impression of Kuhn being a horse’s ass of truly mythic proportions, though some have suggested parts of the book may themselves border on mythic…nonetheless, be civil to your grad students lest they become award winning film makers and/or MacArthur Award recipients long after you and any of your friends are around to defend your reputation. Well, and perhaps because being nice to your grad students is simply the right thing to do.

2. And thus the hitherto obscure word “paradigm” entered popular parlance: a number of years ago, at the height of the dot-com bubble, social philosopher David Barry proposed simply making up a company name, posting this on the web, and seeing how much money would pour in. The name he proposed was “Gerbildigm”, combining “gerbil” and “paradigm.” Mind you, that’s scarcely different than what actual companies were doing in the late 1990s to generate funding. Nowadays, in contrast, they simply say they are exploring applications of deep learning.

3. And by the way, this isn’t the snark-fest promised in the previous blog entry; that’s still to come, though events are so completely depressing at the moment—okay, “Christian” conservatives, you won the right not to bake damn wedding cakes, but at the price of suborning tearing infants out of the arms of their mothers: you really think that tradeoff is a good deal? Will your god? You’ve got an exemption from Matthew 25:35-40 now, eh? You’re completely confident about this? You sure?—I’m having difficulty gearing up for a snark-fest even though it is half-written. Though stuff I have half-written would fill a not-inconsequentially sized bookshelf.

4. It is also notable that the gender ratio at this very technical workshop was basically 50/50, and this included the individuals developing the data and models, not just the consumers. In the U.S., that ratio would have been 80/20 or even 90/10. So by chance is the USA excluding some very talented potential contributors to this field? [5] And is this related to the work of Jayhawk economist Donna Ginther, highlighted on multiple occasions by The Economist over the past few months, that in the academic discipline of economics, gender discrimination appears to be considered a feature rather than a bug? Which cascaded over into the academic field of political methodology, though thanks to the efforts of people like Janet Box-Steffensmeier, Sara Mitchell, Caroline Tolbert, and institutions like VIM is not as bad as it once was. But compared to my experiences in Europe, could still improve.

5. I recently stumbled onto historian Marie Hicks’s study titled Programmed Inequality: How Britain Discarded Women Technologists and Lost Its Edge in Computing.  Brogrammers take note: gender discrimination doesn’t necessarily have a happy ending.

6. Self-learning is, famously, possible for games like poker, chess and go, which have the further advantage that the average person can understand the application, thus providing ample fodder for breathless headlines, further leading to fears that our new Go-and-Texas-Hold’em neural network overlords will, like Daleks and Cylons, shortly lethally threaten us, even if they still can’t manage to control machines sufficiently well to align the doors to shut properly on a certain not-so-mass-produced electric vehicle produced by a company owned by one of the more notable alarmists concerned about the dangers of machine intelligence. Plus there’s the little issue of control of the power cord. I digress.

7.  Amusingly, for the World Bank work, the analyst then has to run comparable regression models because that’s apparently the only thing the economists there understand. At the moment.

8. Nor was this the standard protocol for producing a regression model which, gentle reader, I would remind you has the following steps (as Adam Smith pointed out in 1776, for maximal efficiency, assemble a large team of co-authors with specialists doing each task!):

  1. Develop some novel but vaguely plausible “theory”
  2. Assemble a set of 25 or so variables from easily available data sets
  3. Run transformations and subsets of these, ideally using automated scripts to save thought and labor, until one or more combinations emerge where the p-values on your pet variables are ≤0.05. Justify any superfluous variables required to achieve this via collinearity—say, parakeets-per-capita—as “controls.” Bonus points for using some new variant of regression for which the data do not remotely satisfy the assumptions and which mangles the coefficients beyond any hope of credible interpretation. Avoid, at all costs, out-of-sample assessments of any form.
  4. Report this in a standardized social science format 35 ± 5 pages in length (with a 100-page web appendix) starting with an update of the literature review from your dissertation[s], copiously citing your friends and any likely reviewers, and interpreting the coefficients as though they were generated using OLS estimation. Make sure the “Discussion” and “Conclusions” sections essentially duplicate each other and the statistical tables.
  5. Publish in a proprietary journal which will appear in print after a lag of at least three years, firewalled and thus inaccessible to the policy community, but no one will ever look at it anyway. Though previously you will have presented the problem, methodology, and results in approximately 500 seconds (you’re on a five paper panel, of course) at a major conference where your key slide will show 4 variants of the final 16-variable model with the coefficients to 6 decimal places and several p-values reported as “0.000.” The five people in the audience would be unable to read the resulting 3-point type except they are browsing the conference program instead of listening; the discussant asks why you didn’t include four addition controls.
  6. PROFIT!

I jest. I wish.

9. In fact quite a few people have suggested that computers still aren’t being used to their full capacity in corporations because they would render many middle managers irrelevant, and these individuals, unlike Yorkshire handloom weavers, are in a position to resist their own displacement: The Economist had a nice essay to this effect a couple weeks ago.

10. The concept of a systematic foreign policy is, of course, at present quaintly anachronistic in the U.S., where foreign policy, such as it is, is made on the basis of wild whims and fantasies gleaned from a steady if highly selective diet of cable TV, combined with a severe case of dictator-envy and the at least arguable proposition that poutine constitutes a threat to national security. But ever the optimist I can imagine the U.S. returning to a more civilized approach somewhere in the future, just as Rome recovered from both Nero and Caligula. Also as noted, this workshop was in Europe, which has suddenly been incentivized to get serious about foreign policy.

11. This is an important caveat: the data are every bit as important as the methods, and for many remote geographical areas under high conflict risk, we probably still don’t have all the data we need, even though we have a lot more than we once did. But data is hard, and data can be very boring—certainly it’s not going to attract the headlines that a glitzy new game-playing or kitten-identifying machine learning application can attract, and at the moment this field is dependent on a large number of generally underfunded small projects, the long-term Scandinavian commitments to PRIO and the Uppsala UCDP being exceptions. In the U.S., the continued funding of the ICEWS event data is very tenuous and the NSF RIDIR event data funding runs out in February-2018…just saying…

12. Speaking of PLOVER, at yet another little workshop, I was asked about the painfully slow progress towards implementing PLOVER, and it occurred to me that it’s currently trying to cross a technological “valley of death” [14] where PLOVER, properly implemented, would be clearly superior to CAMEO, but CAMEO already exists, and there is abundant CAMEO data (and software for coding it) available for free, and existing models already do a reasonably good job of accommodating the problems of CAMEO. “Free and already available” is a serious advantage if your fundamental interest is the model, not the data: This is precisely why WEIS, despite being proposed as a first-approximation to what would certainly be far better approaches, was used for about 25 years and CAMEO, which wasn’t even intended as a general-purpose coding scheme, is heading towards the two-decade mark, despite well-known issues with both.

13. Though the other thing to watch here is the emerging availability of low-cost, and frequently updated, remote sensing data. The annualized NASA night-light data is already being used increasingly to provide sub-state information with high geographical precision, and new private sector data, as well as new versions of night-lights, are likely to be available at a far greater frequency.

14. Googling this phrase to get a clean citation, I see it has been used to mean about twenty different things, but the one I’m employing here is a common variant.

15. And while I’m on the topic of unsolicited advice to grad students, yet another vital professional skills they don’t teach you in graduate school is flying to Europe and being completely alert the day after you arrive. My formula:

  1. Sleep as much as you can on the overnight flight (sleeping on planes, ideally without alcohol, is another general skill)
  2. Take at most a one hour nap before sunset, and spend most of the rest of the time outside walking;
  3. Live on the East Coast
  4. Don’t change planes (or at least terminals) at Heathrow
Posted in Methodology | 1 Comment