Seven Guidelines for Generating Data using Automated Coding [1]

Background:

I’m at yet another workshop on standards and lessons-learned for conflict data collections—I believe the formal title of this one is “So, punk, think you can manage a conflict data set??”—and in the near future may be writing a short journal article on the topic addressing the issues for automated systems, so rather than confine my prepared remarks to a small audience of specialists, I’ll develop it in a blog. This is almost certainly going to get modified over the next few days and weeks, and as the title implies, I will be extending it to another 7-point-post [7] on guidelines for using automated data. So if you are planning to assign this, say in a class, check back. [2]

The List

1. Write [and document] everything on the assumption you will be using—and re-using—it far longer than you expect. Doubly so for coding frameworks. Save everything in ASCII or UniCode formats, not binary.

There is a rule of thumb in computer programming that any code you return to after six months might as well have been written by someone else, so document accordingly. There is a large, if not entirely convergent, literature on what constitutes “adequate” documentation—too much can be as bad a too little, as the details get lost in excessive documentation, and it isn’t read—so aim for the happy medium. Meanwhile, share not just your primary documentation, but also those one- and two-page “cheat sheets” and codebook summaries that your project uses all the time.

Little utility programs that one originally writes as a one-off kludge to solve some pressing problem end up being central to the project, sometimes so central that they are unnoticed until someone tries to duplicate your work.[3] Take time to both document these and occasionally “refactor”—clean up the code without changing what it does—and combine them so that multiple steps become a single step.

The stability of coding frameworks, particularly coding frameworks which are considered a “first approximation,” is under-appreciated: Charles McClelland figured the WEIS framework would be rewritten and improved after five or so years; it was in use virtually unchanged forty years later. We wrote CAMEO for a specific project on mediation and it was adopted by ICEWS as a general purpose coding framework.

I recently learned a lesson the hard way on the importance of saving in text formats: I was contacted by a group in Spain concerning some data on the first Palestinian intifada that had been collected in 1988-1991 by a long disbanded Palestinian NGO and which Deborah Gerner and I had used in a 1995 article. I found the files easily enough—with inexpensive high-density media, it is easy to make multiple back ups, which I’d done—but the main file, which had originally been in an obscure MS-DOS database, had been saved in an early version of Excel for the Mac. Fortunately, we’d saved enough extracts in tab-delimited files—no problems at all in reading those—that I think most of the data can be recovered, but I should have had the sense of save the entire file that way. You can pretty safely assume that any binary format will be unreadable after ten years [12]: plan accordingly.

2. Version control, version control, version control. On the programs, data and documentation.

As the Ionian rationalist Heraclitus said 2,600 years ago, the only thing permanent is change. [4] Automated data sets in particular are meant to be recoded as the dictionaries and coding systems improve, so unlike a survey, the data never become canonical: they are always subject to change. For the last few years, TABARI has had the ability to automatically prefix any data set it generates with a record of the date, dictionaries, and comments, a feature we will be incorporating into our new systems as well.

The flexibility of automated coding also means that it is quite likely you will change your coding framework, particularly in the early phases: that’s a feature of the automated approach, not a bug. Just document it. The data sources are going to change over time as well, particularly right now: document those as well.

At the purely pragmatic level, at various times we have lost track of unintentional “forks” in the TABARI source code, the TABARI validation suite, and multiple versions of the CAMEO manual, and putting them back together was not fun. All projects with any reasonable level of complexity need version control; the software required to do this is open source and very mature, if occasionally counter-intuitive until you get the hang of it, so use it.

3. Reliably human validated records—both real cases and unit test cases—are an extremely valuable resource.

TABARI has an extended set of artificial “unit test” cases—about 700—which are run after every change to the program, however trivial, and if the new code fails on even one of these, it is fixed, promptly, before the code is used or modified further. This isn’t our invention: it is a standard practice in software development, and has saved us from many bugs which might have otherwise gone unnoticed.

What TABARI—and our automated event data projects more generally—do not have is a very large set of genuine news articles where a “correct” coding has been determined and consequently can be used to test both the software and the dictionaries. We’ve certainly done those tests, but did not have the foresight to keep the inputs. And these are very expensive to generate: easy cases are cheap, but the program will get almost all of them anyway. The same for cases which the program will never get. But the difficult cases are frequently those which are also ambiguous to human coders—which then must be resolved in long discussions—or those which occur rarely but reveal systemic problems in the dictionaries. We really should have retained these over the years, and in the future we might be able to generate a set comparable to those we have for unit testing. Or perhaps, just perhaps, such sets which were generated with public funding might be made available?

4. Data are noisy: discuss and document the sources of noise rather than pretending that the data are perfect.

I addressed this in a recent post, and plan to return to it later, but I repeat: event data are noisy. And, by the way, human coded data are not perfect either: one of the presentations at this conference was discussing a method called MSE that is used for determining total casualty figures in conflict zones and it is based on the assumption of random errors in [human compiled] data.

Ideally, in fact, we need multiple measures of accuracy. In the case of event data, for example, I can think of at least ten [8] measures, and no system I know—including our efforts—has been evaluated on anything close to all of these. Because of these multiple indicators, single measures of “accuracy” are meaningless: for example the widely-reported measure in King and Lowe (2003) is quite quirky, and has been misinterpreted in multiple ways over the years.

By the way, there seems to be some serious rot—for want of a better term—in the claims of inter-coder reliability in human-coded projects. After reading the results found by Mikhaylov, Laver and Benoit (2012) and Ruggeri, Gizelis and Dorussen (2011) [12], I’ve started noticing multiple instances where claims of 80% reliability cannot be replicated, and true replicability is frequently as low as half that level. Furthermore, once I started noticing this, I’m seeing the problem pop up in multiple fields—it is certainly not confined to political science—and I’m beginning to wonder whether it needs to be addressed more seriously. It indicates sloppiness at best, and can be considered a mild form of fraud at worst.

More generally, I do not care what the intercoder reliability was on your first 200 data points: I want to know what it is on the last 200 data points. [5] And if the project involves multiple institutions, and the teams of coders have changed over time (particularly for multi-year—or multi-decade—projects) I want the inter-coder reliability between the original coding team and the latest coding at another institution.

We almost never see these figures. My guess is that if we had them, the reliability of coding for mature automated systems would already be substantially higher than human coding reliability, and it is going to improve further, whereas human coding is probably not going to improve further except to the extent it incorporates machine-assisted methods. I could be wrong, but based on the evidence from the few attempts at replication, those claims of 80% should be viewed with a high level of skepticism.

5. Automate as many steps of your data generating process as possible.

At one point in the 2000s, I was systematically updating the KEDS Levant series every three months for the Swiss Peace Foundation—prior to this it tended to get updated whenever we had a conference paper to write—and I was noticing the process seemed fairly complex (in particular, sufficiently complex that it was not possible to delegate several of the tasks, though we had delegated some of it). So I actually wrote down all of the individual steps: There were about sixty.

With other demands—and most certainly, the fact that I was only doing this every three months—I never really automated this further, though it could have been considerably simplified even using the tools I had at the time, and most certainly could have been scripted as we moved into a Unix environment. GDELT, in contrast, gets updated every day around 2 a.m. using fully automated methods.

Scripts are your friends: they are little assistants who are completely reliable, do the same thing every time, and never show up late with a hangover or have a midterm they need to study for. Scripts also force you to systematize and document things which otherwise fall into that undocumented “Oh, I’ll just do it for you” category that will come back to haunt you five years later. They aren’t always appropriate, but I’m guessing most social scientists use these these less frequently than they should.

6. You will be lucky if two-thirds of the students you hire to work on the human-coding aspects of the project actually get it. For programmers, make that one-third.

Unless your system works entirely with unsupervised machine-learning methods—and thus far, no open coding system has achieved this, though there are a few undocumented and implausible claims that such systems exist—you are going to require the assistance of carbon-based life forms. Good luck.

As with other aspects of the coding projects, design for robustness and failure. Dictionary development and coding of validation sets is not as mind-numbing as repetitive coding, but it still requires sustained attention to oftentimes seemingly endless little details, and most humans are not good at this. In our experience, we see at least a 80/20 rule operating—and more likely this is a 90/10 rule—with 80% of the best dictionary development due to 20% of the coders. [6] Every once in a while we get an extraordinary coder—at the height of our early work on KEDS, there was a coder working on the PANDA project who could describe bugs in the program with such precision that I could sometimes almost identify the line of code that had to be involved—and then the project jumps forward. But most people aren’t at this level, and things move slowly and incrementally.

It would be great if we had a nice diagnostic test for identifying the really good people, but we’ve never come close to it. In the KEDS project, we learned that to involve the coders themselves in the hiring process, which improved our yield but didn’t make it perfect. Consequently we had a fairly long “shake-down” period of coders working on supervised test cases before they did actual development, and figured we would get a 60% to 70% yield, if that.[13] Though once we had someone trained, they would usually work for us the remainder of their time at Kansas, and sometime longer. Using experienced coders to supervise the less experienced also helps, though sometimes the less experienced coders are good at noticing the inconsistencies in your protocols.

Programmers?—I used to think I was uniquely bad at hiring programmers, or that good programmers didn’t want to work for political science projects, until I started reading the general management literature on this, and realized it is a problem that everyone has, and poor (and/or poorly managed) programming teams have caused the collapse of multi-billion-dollar projects. Our issues pale in comparison. In those cases when I get a good programmer, the project advances disproportionately; at other times, we just muddle through. But any project director who thinks “We’ll just hire a programmer” is living in a fantasy world.

7. Link to as many reliable standards and existing resources as possible, and of course, open source.

Finally, use open source materials, and contribute to the open source communities. The situation here is hugely different than when we started work 25 years ago, and our project is still catching up on using all of the readily-available material on the web: for example it took us a lot longer than it should have to make full use of WordNet, though in CountryInfo.txt, we have made good use of resources such as rulers.org and the CIA World Factbook. [9] Still, there is a lot more “dictionary fodder” we should be taking advantage of, both for the basis of dictionaries and updating them.

Some of these resources are easy to incorporate, others more difficult. Fortunately, any large-scale compendium available on the web will be embedded in some sort of standard HTML page which, after a bit of routine if customized programming to extract the fields, is effectively structured data. And despite my continual screeds on the obstinacy of academic quantitative conflict analysis—rapidly approaching the status of a methodological suicide cult [10]—in using their funny little COW country codes instead of ISO-3166, that’s simply a table lookup problem and there’s even an R package, countrycode, which solves it. Besides, COW codes are positively sane compared to FIPS codes.

And of course, it goes without saying, borrow from, and contribute to, the open source projects. There is some weak evidence that for really large projects—those above 1-million lines of code—closed projects may still have an edge, but the existing systems are nowhere remotely close to this level, nor will they ever be: the required processing simply isn’t that complicated, particularly once coding is broken out into discrete components, for example for geocoding, parsing, and feature extraction. Below that level, study after study has shown not just that open source is cheaper—if only “free as in puppy”—but better. What’s not to like? [14]

Footnotes

1. After the last posting, I am told there was a tweet to the effect “Schrodt should write everything in ‘sevens.’” I now take that as a challenge!

2. No, I’m not just trying to drive traffic to the site. Hmmm, but this caveat is silly, right?: even if you do use this in a class, you’ll just provide a link, not print it off. But you might re-read it, so you can counter the individual who says “But Schrodt says that for government contracting work, best practice involves [1] vacuuming up everything you can in the open source world without attribution; [2] doing “hide and hoard” for anything original, using proprietary software; [3] “ghosting” open-source alternatives with proprietary and opaque tests with ludicrously implausible results whose details change every time they are challenged; [4] hiding everything under “unclassified but sensitive” and NDAs; [5] never forget the Prime Directive of the Two-Year-Old: “What’s mine is mine, and if I want it, what’s yours is mine as well.”; [6] open source is for chumps;  and [7] consuming as much taxpayer funds as possible on this exercise, since Paul Krugman assures us this austerity thing is way over-rated, and if you need to spend money, let’s spend it here.  The greatest chump of all is the taxpayer: lambs to the slaughter, sheep to be sheared.

3. Evolution works the same way: by current accounts, only 10% of the DNA in your body is human; the rest is that of a complex biome of micro-organisms without which you would die.

4. The Buddha, with rather more substantial impact, said much the same thing at much the same time. Ah, the mystery of the Axial Age…what was going around then?…

5. Okay, in fact knowing these figures for the average coded unit would be the best indicator.

6. By convention, we’ve always called them “coders” even though they are actually figuring out patterns for the dictionaries.

7. Which would be a good name for a bluegrass band.

8. Here goes…

  1. Accuracy of the source actor code
  2. Accuracy of the source agent code
  3. Accuracy of the target actor code: note that this will likely be very different from the accuracy of the source, as the object of a complex verb phrase is more difficult to correctly identify than the subject of a sentence.
  4. Accuracy of the target agent code
  5. Accuracy of the event code
  6. Accuracy of the event quad code
  7. Absolute deviation of the Goldstein score on the event code
  8. False positives: event is coded when no event is actually present in the sentence
  9. False negatives: no event is coded despite one of more events in the sentence
  10. Global false negatives: an event occurs which is not coded in any of the multiple reports of the event

9. GDELT, in contrast, made very effective use of GeoNames and related resources to add a geospatial component to the data.

10. Remember that awkward conversation with your parents before you went off to college? “Dear, we know you are going to be out on your own, and we want you to experiment with new things, but please, honey, don’t join any suicide cults…” Yes, even a suicide cult devoted to the glorification of garbage can models run on obsolete data sets and hypotheses no one has taken seriously since before the era of Ronald Reagan, disco music and bell-bottom jeans. Suicide cults: JUST SAY NO. Unless you want tenure.

11. In fact, if you want to retrieve the data badly enough, there are commercial services that can do this—for a price—at least for formats as common as the variations of MS-Office. Though not necessarily for a database written by three guys in a garage in East Jerusalem in 1986. Still, it is a whole lot easier just to open a text file.

12. Paywalled.

13. Usually they just quit, or even more typically, just stop showing up: coding is like that. As a consequence, unlike Mitty Romney and Donald Trump, I’ve never developed a knack or taste for firing people. How sad.

14. We can also learn from the systematic studies of how the successful communities function: open source has been around sufficiently long that there are some fairly consistent results here, and we don’t need to re-invent the wheel.

Posted in Methodology | Leave a comment

Seven remarks on GDELT

[Okay, okay, I'm clearly in the "he just doesn't get it" camp on the "post brief/post often" aspect of blogging—and Twitter is beyond the pale for me—so sorry, yet again you're just going to have to use the vertical scroll bar.]

GDELT continues to get attention, and there is an article forthcoming on it in New Scientist [this will be linked when available]. But as people get beyond the level of cool visualizations, some concerns are arising as well, particularly on the issue of the high level of false positives and noise. These are completely legitimate but before these are used as excuses to ignore GDELT and just keep throwing tax-payer dollars down proprietary rat-holes, some points to consider which I should have posted weeks ago.

GDELT is a disruptive innovation

Harvard Business School’s Clayton M. Christensen has, in an assortment of articles and books, explored the issue of “disruptive innovation”

“Generally, disruptive innovations were technologically straightforward, consisting of off-the-shelf components put together in a product architecture that was often simpler than prior approaches. They offered less of what customers in established markets wanted and so could rarely be initially employed there. They offered a different package of attributes valued only in emerging markets remote from, and unimportant to, the mainstream.”[9] [link to source]

Observe that this is a nearly perfect description of GDELT. The components consist largely of existing software: web-scraping, the 12-year-old [NSF-funded] open-source TABARI coder and its open-source dictionaries, servers and databases, Geonames and other geospatial resources, R and various other open-source packages for analysis and visualization. This is not to diminish the role that Kalev Leetaru had in properly assembling all of this, nor his insights in writing ancillary programs to significantly extend the capabilities of the existing systems, but in the end, this was the project of a single graduate student, not a multi-million-dollar investment.  And massively complex, multi-million-dollar systems are precisely what GDELT could supplant. It will, of course, be rejected in the stratospheric precincts of the established users: it isn’t perfect [or costly] enough, and the massive sunk costs of the proprietary alternatives must be justified. But it opens a huge new “market” to users who could not break through the “hide and hoard” barriers of the existing near-real-time datasets, and is simple enough to be adapted as well as adopted. And let’s not forget the cost of acquisition: zero. [1]

Political forecasting has a different set of signal/noise issues than many engineering problems

People seem to be registering that properly using event data involves dealing with a signal-to-noise issue, but I sense that they don’t fully appreciate how the characteristics of those problems in political forecasting are different from those of many, if not most, engineering problems.

Just to choose an example at random, let’s take the problem discerning signal from noise in a sonar system that is trying to determine whether there is a submarine in the vicinity. This has remarkable technological challenges, and can only be solved with a great deal of expertise but, compared to a political forecasting problem, one aspect is very simple: the submarine is either there, or it is not.

Contrast this to a very real [qualitative] forecasting issue from a few months back: on 2 October 2012 Israeli Prime Minister Netanyahu was reported to be pursuing increased sanctions against Iran, in contrast to constant rumors over the previous six months [10] that Israel would attack Iran’s nuclear facilities.

Does this mean:

  • Netanyahu changed his mind and decided sanctions are an effective approach
    [face value]
  • Netanyahu concluded Obama would be re-elected [international considerations]
  • The Israeli military finally persuaded Netanyahu that an attack was a bad idea [domestic considerations]
  • Israel was going to attack Iran in the near future [deception strategy]

Furthermore, was there even a single answer to the question of Israeli plans at, say, a three-month lead time? [2] Any analyst could think of several dozen low-probability contingencies that could modify that probability. The “submarine” is not either “there” or “not there”, but rather exists in a probabilistic haze of possibilities, at least some of which are very low probability black swans.

Yes, messy it is, but it’s been this way for a very long time: The debates between Alcibiades and Nicias on the wisdom of invading Sicily—very much a forecasting problem—vary little from the debates [or absence of debate] 2,400 years later between Rumsfeld/Wolfowitz and Mearsheimer/Walt on the wisdom of invading Iraq, and Big Data isn’t going to change that. As the [presumably apocryphal] exchange at the Congress of Vienna went

  • Aide: Your excellency, the Russian Ambassador has just died!
  • Prince von Mitternich: Fascinating…now, what were his intentions?

A full discussion of this can—and has—filled many books, and not all of them by Nicholas Taleb, but the point here is that unlike a sonar problem, where if you reduced the noise to zero—for example if you found some [fantasy] method by which water became as transparent as air—you would reduce your uncertainty to zero. But in the political forecasting problem, the uncertainty at realistic forecasting horizons never goes to zero, and in fact in both PITF and ICEWS, seems to stabilize pretty consistently around 20% for a wide variety of problems and methods.

But this, in turn, means that reducing the measurement error in the event data to arbitrarily small levels is not going to reduce the uncertainty of the forecast to arbitrarily small levels. To the contrary, as we’ve known for at least a decade, event indicators can be hugely simplified—typically with Goldstein scaling or quad counts—with little or no discernible loss in the predictive accuracy of the models.

Furthermore, since both the genesis and virtually all of the applications of event data have been to the issue of forecasting, this has been built into the analytical approaches: once the data are “good enough”, it makes much more sense to invest in improving the models, not in an infinite pursuit of greater precision in the measurement, as that will not contribute to the eventual objective.[9]

Remember the origins of the dictionaries

The event data coding of GDELT used off-the-shelf technology (the geolocation coding was new and original), and in particular it used very general coding dictionaries.[3] The verbs dictionary was the [NSF-funded] Kansas CAMEO dictionary, which had largely been developed to code events in the Middle East (with subsidiary development for the Balkans and West Africa, but never beyond that). CAMEO itself was originally designed as a specialized coding system to study mediation, and much of the detailed [NSF-funded] dictionary development was done with this objective, not for the study of all political behavior. The fact that a code exists in CAMEO does not mean that it has been thoroughly instantiated in the dictionaries. Finally, the actor dictionaries are from the general CountryInfo file—originally developed for the filtering programs used by the [NSF-funded] Militarized International Disputes project—supplemented with an assortment of NGO and MNC lists obtained from the Web, and a new WordNet-based [NSF-funded] agents dictionary.

Despite this re-use, the overall package works pretty well, but is certainly not all that could be done…bringing us to…

GDELT is a beta, not the limit of the technology

GDELT introduced several new features—beyond sheer scale—into event data coding. The obvious one is geolocation, but it also treated common nouns (“agents”) differently than in the past, did full-story coding, was the first major dataset based on a CountryInfo dictionary, used more sophisticated pre-processing than we had done, used a date-shifting feature of TABARI that has not been extensively tested, and used geospecific duplicate filtering. These should all be considered experiments, not definitive answers: I think most of these worked but, for example, I remain skeptical about full-story coding, and as several people have noticed, there are some very odd agent-based codes in there, as TABARI assembled these on the fly based on some fairly simple rules.

Beyond this, there are some additional enhancements developed in the course of the ICEWS project which are conceptually available in public sources that could be incorporated and which would certainly reduce the false positive problem (which was also a very big issue at some phases of ICEWS): these include filtering on certain actor/action combinations (NGOs and journalists rarely engage in material violence [4] and this is a common source of coding error); thorough filtering of historical, entertainment, and sports stories; and special treatment of certain “poison words”—mostly negatives—that are easily misinterpreted. Existing NSF-funded work at Penn State has produced and almost-but-not-quite implemented a complete reorganization of the verbs dictionaries based on WordNet synonym sets.

All of which is to say GDELT 2.0 (and 3.0…) will almost certainly have fewer false positives than GDELT 1.0. Because we’ve got more development coming down the pike…

What’s ahead

Skating just ahead of the Coburn Amendment [5], Mike Ward, Jay Ulfelder and I have received funding from NSF for a project titled “Multiple Attribute Data Coded On Web”—MADCOW [6]—which is assembling a series of tools to generate several web-based data sets, including a near-real-time event data set which will second-source GDELT’s, probably using slightly different technology.

MADCOW is funding the development of a new, Python-based coder which will replace TABARI and which will feature

  • a extendible parser based on the Python nltk package
  • a far richer, JSON-formatted dictionary structure
  • hooks for the incorporation of additional packages for feature extraction (for example, the topics and size of demonstrations)
  • partially automated verb dictionary development, as well as named-entity-recognition systems in the MADCOW system generally
  • designed from the beginning for distributed processing of very large datasets

This will, of course, be open-source, and unlike TABARI, it will be hosted on GitHub: we will put out an announcement when we are ready for contributors beyond our initial core of developers. We also expect the Python code to be far easier to understand, maintain and extend than TABARI’s C++ codebase.[7] Our current expectation is that this will be available for at least experimental use by the end of the summer.

MADCOW is also going to be hosting platforms for various aspects of crowd-sourced data validation, and we are hoping this can be used to refine the dictionaries. Need financial, criminal, and natural disaster categories in CAMEO?: we’re adding those as well.

RTFM

I know, people don’t read manuals, or much of anything, any more, but indulge me for a moment and at least consider looking at the following materials before treating GDELT (and automated coding methods generally) as though they were artifacts dredged out of a cave somewhere accompanied only by a few pottery shards with cryptic inscriptions in early Etruscan.

  • the 200-page TABARI manual, with two chapters plus an appendix on general aspects of machine coding and dictionary development. It is not just a guide to menu and project file options (though it has that as well)
  • the 200-page CAMEO manual, which likewise has extended discussions of why CAMEO has been constructed as it has
  • a book-length manuscript on event data coding, Analyzing International Event Data: it’s from around 2000 and the later chapters are a bit dated, but I updated the first three chapters last year
  • a history of the event data project, which will give you some sense of how all of this material came into being

and if you still won’t read, here’s a two-hour video (with thanks to Indiana University).

Though if you are inclined to read (or, in a project, have someone else read), there’s a lot of material here.

In the end, open source will win

Returning to the original conundrum, where do you put your chips?:

  • GDELT, with open-source software, dictionaries and costs [11], an emerging technical community, but like all disruptive innovations, a bit rough around the edges and we still don’t fully understand how to effectively use it;
  • The multi-million dollar proprietary systems—assuming you can get access to them—with none of the above except, let us be honest, that analytical community is also only gradually learning how to use the data, and in many cases, trying to learn social science modeling as they go along.

We know how this story ends: the open source community will win. The basic technology works—and in fact it may already work as well as it needs to for forecasting, if not for monitoring—and it merely needs to be further refined and applied in a larger number of domains. But—Wiley Coyote ten feet beyond the end of the cliff—you can sustain proprietary alternatives for a long, long time provided you’ve got unimaginable gobs of money, and the U.S. government is under no constraints whatsoever in that domain, right? [8] Just don’t look down.

Footnotes

1. I had a brief discussion with someone associated with one of the government-supported projects who had a glimmer of hope on the end of GDELT. “So, how long do you have funds to sustain those updates?” My response: “We’ve got some NSF funding, and the marginal cost of updating the data is zero. Take a positive number, subtract N times zero from it, and the value of N where that goes negative is when we will be forced to shut down due to costs.”

Okay, I wasn’t quite that clever. And I get a monthly bill from Linode for $19.95 [!] to host some cloud resources we’re using. Drat…have to cut back on my Starbucks habit to keep this going!

2. I’d say it was a combination of the second and third factors.

3. In particular, GDELT makes no use of any ICEWS developments: it is entirely open-source components, largely accumulated through various NSF-funded projects. ICEWS has available very extensive actor dictionaries which it would be nice to get unrestricted access to, though at the point when I was last associated with the project, had not made any significant changes in the verbs dictionary.

4. Yeah, we will miss a few events involving Greenpeace as an NGO and Hunter S. Thompson as a journalist.

5. And in any case, with partial funding from Methods, Measurement and Statistics, which thus far is not in the sights of Coburn and Flake, though probably soon to be, as it funds people who do politically relevant analysis using tools more sophisticated than Excel.

6. Blame Ward, not me, for the acronym.

7. Pointers…love’em, but when they go five levels deep, the code is getting a bit obtuse. Though TABARI coded the 200-million records of GDELT without crashing.

8. Joke.

9. Jay Ulfelder’s issue of using event data for monitoring, on the other hand, is closer to the submarine problem, with the difference that as long as we are depending on open sources (and in much of the world, that’s all we’ve got: intelligence resources are finite, despite what you see in the movies), the sources contain irreducible levels of error. Though we can certainly do better than we are currently doing on the false positives in GDELT.

That said, having spent much of the past month human coding—yes, I do human coding as well—atrocity reports (killings of five or more noncombatants in a single incident) rather than writing blog posts, the distinctive aspect is that these only rarely generate a single report: instead, they usually generate a cluster of reports, and then a cluster of reactions to the report. A “bolt out of the blue” dyadic interaction of material conflict which does not generate any sort of followup, usually within a 24-hour period, is almost certainly incorrectly coded. Additional enhancements to the dictionaries—which at the CAMEO stage were developed to code mediation (we had coded conflict in the earlier dictionaries for WEIS, but WEIS had only a single violent conflict code)—would probably also help.

10. One had the sense many of the pundits were so certain this would happen that they were listening on their cell phones for the sounds of Israeli bombers starting their engines.

11. Open source solutions are, famously, “free as in puppy.” But the marginal costs are very close to zero.

Posted in Methodology | 2 Comments

GDELT: Global Data on Events, Location and Tone, 1979-2012

I’ve gotten a couple of queries as to why I haven’t said anything about Kalev Leetaru’s  new GDELT—Global Data on Events, Location and Tone, 1979-2012—data set here. Short answer is ISA (with a paper on GDELT), completing my taxes (done), and getting the PITF atrocities data up to date (we’re close).

But…this thing is generating a great deal of interest, and people seem quite happy with the quality as well as the quantity of the data, to say nothing of the accessibility and price. All consistent with the results of a variety of on-going experiments we’ve been doing at Penn State since the late summer.

[Okay, let's show a little more enthusiasm...in fact this response is stunning! The data were initially posted on 26 March 2013, and in three weeks we've already got multiple tutorials, plenty of R code, some amazing visualizations, and coverage in the semi-popular press. Does this not support my contention that the traditional political science journal—which takes about three years from initial development of a paper to refereed publication—is unlamentedly doomed?]

So, as a short term solution, I’m posting links on the GDELT data site: these will be updated as I receive them, unless this goes truly viral and I can’t keep up. At present, I would recommend in particular the tutorials by John Beieler and David Masad, the animation of Jay Yonamine’s this-only-looks-like-Wikileaks Afghanistan sequence in The Guardian DataBlogand Rolf Fredheim’s cool Russia visualizations.

[Addendum 16-April-2013]

I am informed by those-who-Tweet that in addition to the blog presence, there are also numerous conversations about the data going on in the Twittersphere. Mostly revolving around the theme of “GDELT is hard.”  33-years of hierarchically coded categorical data with 200M observations: yes, GDELT is hard. Persist.

Posted in Methodology | 2 Comments

Defunding NSF Political Science, Monday morning edition

After initial silence, we seem to be getting a fair amount of response on this—see John Sides (with continuing coverage) and Henry Farrell over at Monkey Cage—including, impressively, support at The Economist.

Some of this analysis seems spot-on; other responses—I trust you folks who were enthusiastically endorsing hurling obscenities at Coburn are aware that these all end up in his briefing papers the next morning, but assume, presumably, that he will be is absolutely petrified at the rhetorical intensity of your opposition—make me want to just say, well, if this is the best the community can manage, the program deserves to get cut.

Hey, send the geezer out to pasture, eh?—guys just wanna have fun, and to hell with the consequences.

A couple reactions nonetheless:

1. Those who think that NSF is going to play little Jedi mind tricks with the national security and economics exception are living in a fantasy world. While I’ve not talked with anyone at NSF since this hit, my guess is that they are in serious damage control mode and their single greatest fear is that this is going to spread to other programs, either within the social sciences or to other topics, climate change undoubtedly being the prime target.

I’ve had a lot of experience with NSF over the years, and have a great deal of respect for the institution. Many of the comments are treating it as though it was Officer Krupke in West Side Story, or Inspector Jacques Clouseau of Pink Panther fame. It’s not: it is a  large, mature, and sophisticated bureaucracy—”bureaucracy”: aren’t political scientists also supposed to know something about those?—with a complex relationship with Congress, and a collective sense of responsibility for basic research in the sciences generally. They aren’t going to regard this as a funny little game.

Which, in turn, means that yes, this is going to have consequences.

2. Splitting off the national security and economic research communities is significant. I’m not suggesting that I could personally turn this thing around—though I conceivably could do something more effective than on-line petitions or profanity-laced diatribes—but I would happily lobby for the policy importance of the NSF-funded work I’ve been doing. The [hypothetical] response now?: “Yes Phil, we agree entirely, and that’s why we put in the national security exception. Thank you for your service. Oh, and loved your critique of garbage can models in Seven Deadly Sins…”  Same with anyone doing domestic, comparative or international political economy; same with anyone doing comparative political violence, and possibly comparative democratization. [3] What you’ve got left is, in fact, pretty much U.S. electoral, institutional, and [yes] Congressional studies, and whatever aspects of comparative politics can’t be plausibly—emphasis on plausibly—argued to have security or economic consequences. [4]

If Coburn’s target is legislative studies—we still don’t know—and I were looking at this from the outside (which, now, I pretty much am), I’d say that was a really slick move.

3. I quit the APSA about three years ago, after receiving an extended lecture from Michael Brintnall on the links between open access journals and the end of civilization, and the Perestroikan/White Citizen’s Council successful blackballing of Walter Mebane, so again, I’m on the outside here as well. Still, for years (and thousands of dollars paid in inflated dues), I listened to the APSA extoll its superb access to Congress through the legislative fellows program, and this in turn justified their need to be in Washington rather than, say, Tucson. [1]  IMHO, if APSA were doing its job, this issue would not have made it anywhere close to a floor vote—similar efforts have been pushed back several times over the past thirty years—and the response would have not been a petulant press release that may well be making things worse.  Jennifer Victor outlines a lobbying strategy—based on, hey, research by political scientists [!]—that in all likelihood would be highly effective, but requires coordination and initiative.  I can only hope that something is going on in the background, but I’ve seen little evidence of this. [2]

Footnotes

1. Not trashing Tucson: that is where the much more reasonably priced Middle East Studies Association and International Studies Association, professional memberships I’ve maintained, are located.

2. Same for MPSA, though it was long ago relegated to shilling for the Palmer House and maximizing the number of second-year grad students it could stuff onto panels, at least in IR. Though at least is in Bloomington, Indiana, not Dupont Circle.

3. IR and much of comparative being further distracted from the debate by the fact that we are all desperately trying to finish our ISA papers. Unless procrastinating by writing blog entries.

4. Methods, measurement and statistics?: there’s a separate program for that. Those who see this move as the end of the Society for Political Methodology: isn’t going to work out that way, and besides, about half of the SPM oligarchy now seem to have joint appointments in professional schools anyway. Same for “law and society.”

Posted in Methodology, Politics | 1 Comment

Defunding NSF Political Science, continued

John Sides at MonkeyCage has provided the revised text of the Coburn amendment

(a) None of the funds made available by this Act may be used to carry out the functions of the Political Science Program in the Division of Social and Economic Sciences of the Directorate for Social, Behavioral, and Economic Sciences of the National Science Foundation, except for research projects that the Director of the National Science Foundation certifies as promoting national security or the economic interests of the United States.

Oh…national security exception…hmmm, with the Political Instability Task Force, DARPA ICEWS [6] and the IARPA ACE and OSI programs to fall back on as justification, that pretty much covers every NSF grant I’ve ever had [1] and certainly anything I could imagine applying for in the foreseeable future.

So there’s not a story here after all…move along, move along.

Just kidding.

Interesting development, and one that probably should worry a lot of people, and certainly not just in political science. [2] Given that this has also dropped that cynical cancer research provision [3], one could imagine this not only passing at some later date, but going viral to other parts of NSF, certainly within the Social, Behavioral and Economic Sciences directorate, where Political Science would by no means be the most vulnerable program on this criterion. [4] And in an era of fiscal restraint, extended to various $100M+ natural science programs which many people within those sciences think are a waste of resources—LIGO is only one of many—and a really big Pandora’s box has been opened.

Yet I still remain mystified by Coburn’s obsession with this. While I certainly do not agree with the gentleman on everything, or even most things, he is by most accounts very intelligent and by no means a hypocrite: in one of his earliest forays against “waste”, he started by highlighting some variation on “bridge to nowhere” in his own state and while there were probably a few cheap shots at legitimate research that was poorly characterized [8], research was by no means the only target, and many of the projects he targeted did look a tad dubious. We’re not dealing with a reality-challenged Michelle Bachman here.

So what’s going on? Here’s some speculation based on one of the earlier instances of these attacks (they come every five years or so): it’s not Coburn—who was a physician—but (duh…) one or more of his staffers who had a bad experience with graduate work in political science. In all likelihood, someone who excelled at the “slow journalism” approach of undergraduate polisci, then got into grad school (or didn’t, or didn’t get into the programs where they imagined they belonged, possibly due to low quantitative GRE scores) and found it was an entirely different kettle of fish due to the methodological requirements, and this person became very unhappy. Add in the possibility that this individual was a chip-on-the-shoulder twenty-something conservative who thought the whole of academia was against them [5], and we’ve got the sort of deep-seated “I’ll show those bastards…” resentment that could fuel this sort of thing.

And then there is Senator Jeff Flake, who has an M.A. in political science from BYU. Come on, somebody must know the story here.

Update 2030 EDT

Oh crap, it passed! Not the first time I wish I hadn’t been quite so accurate in my predictions. Without even a recorded vote…wow, a lot of hill rats out there with low quant GREs, it seems. Maybe the House will want to trade this out for something more popular—birthday wishes to Ahmedinejad and Kim Jong-un, maybe?: seems about our status at the moment. Not good news.

Footnotes [7]

[1] Funding for OPOSSEM and the Political Methodology summer meetings might have been a stretch, but given the dependence of all of those quantitative national security projects on state-of-the-art statistical methodology, not much of a stretch.

[2] Insert obligatory Neimöller quote here.

[3] Maybe they are reading my blog! Yeah, right…

[4] And yes, you ANES folks should be afraid, very afraid. On the other hand, by its own Maoist self-criticism exercise, the Republicans so royally messed up their polling in 2012 that Barak Obama was easily re-elected under what should have been a challenging  economic environment, and from the GOP perspective, the re-election of Obama leads to horrible things like the nominations of Chuck Hagel and John Brennan to important national security positions, so maybe we need the ANES as well.

[5] Look, Republicans, if one of your own rising stars criticizes the GOP as having branded itself as “the stupid party“, can you really blame academics from failing to flock to the banner? You gotta give us something to work with, people! And Jindal’s Louisiana is a regular Athenian Academy compared to what we’re dealing with in Brownbackistan. Besides, contrary to stereotypes, political science departments tend to be among the least liberal of the social sciences and humanities, according to most of the surveys [I think that] I’ve seen.

[6] Direct links to ICEWS have disappeared from the web…which is what happens when a DARPA program is successful. Just Google “ICEWS” for the secondary links, though those Wired stories have lots of the facts wrong.

[7] Sorry, haven’t figured out how to do these automatically so they link; project for a later date.

[8] Studies of the sexual behaviors of primates were disproportionately targeted, as I recall.

Posted in Uncategorized | Leave a comment

Defunding the NSF Political Science Program

THEEEY’RE BAAACCCKKKK!

Yes, that fun-loving Senate caucus out to save the world from the horrors of systematic study of politics—clearly unnecessary, as the splendid outcomes of the invasion of Iraq, ten years ago this week, plainly demonstrate that we already know everything there possibly is to know about politics—is back at work again, with yet another mischievous set of amendments prohibiting the National Science Foundation from funding political science research. The latest iteration ingeniously moving the money into cancer research, though as someone with more than a passing familiarity with cancer, I find this a particularly cynical exploitation of other people’s tragedies. Though hardly unexpected: it’s the American way.

At issue here is around $7- to $10-million dollars per year. To put this into perspective, the astronomers and physicists have managed to cadge about $365-million in NSF funding  — which would fund the NSF political science program until about 2050 — for this

http://www.ligo.caltech.edu/

which I also doubt is going to do much to cure cancer.   Somehow methinks money is not really at issue here.

Yet the response of the political science community has been astonishingly lame. Back in the days when I was an environmental activist, Rule #1 was that petitions were a complete waste of time (other than making your membership feel good), and Rule #2 was that letters generated by simply following a template were almost as great a waste of time. Yet that seems to be the gist of all of the suggestions.

Where are the APSA and MPSA on this?—so busy defending their precious journals against the threat of open access that they didn’t see this coming? Or in the case of APSA, under the control of the Perestroikans who are delighted this is happening? Doesn’t APSA occupy some of the most expensive real estate on the planet precisely to guarantee access to Congress? Or have the defenders have simply gotten worn down and sooner or later Coburn, Flake et al will prevail?

The irony, of course, is that projects like the Political Instability Task Force and Worldwide Integrated Conflict Early System have made very substantial use of data, methods and software developed under NSF funding for basic research over the past forty years, and are now directly feeding into policy decisions.

In fact, one could easily imagine might some day projects like PITF and W-ICEWS might help prevent a mistake like the invasion of Iraq.  Estimates for the cost of which vary wildly, but taking the median figure of $1.5-trillion, would fund the NSF political science program for around 150,000 years.

[Apologies for the lack of recent posts but I've been frantically trying to catch up on coding a data set for the aforementioned PITF and dealing with bureaucratic snafus at a large football-and-drinking school. More on that later. Then that little detail of ISA.]

Update: 19 March 2013

Sarah Binder over at Monkey Cage is providing some play-by-play on this in the Senate (and, based on the comments, the issue is still on-going). If I’m understanding the situation correctly, Reid allowed [more or less] the GOP nine amendments, and Coburn, with only of those four golden tickets, chose to allocate one of those to defunding political science! $10-million in a roughly $1-trillion budget bill. Come on, folks, even by the standards of a party where the term “wacko bird” is now considered a compliment, there has got to be more going on. Inquiring minds want to know!

Posted in Politics | 2 Comments

The Job Talk Debate

This is what blogs are about, eh?

The issue of value of the job talk, a blogosphere phenom [1] initiated by Dan Nexon has now escalated to the usually rarefied policy precincts of Dan Drezner [2], so rather than work on the [long] list of things I was going to do as soon as I got into the office, I will briefly [for me] chime in.

Specifically, on the side of the pro-job-talk faction.

The political science academic job market has many flaws, but the job talk is not central to these, nor is it completely artificial.

1. Organizationally, it is the one time most of the faculty see the candidate at the same time and have a common base of comparison. Furthermore, since usually the candidate will present on a topic fairly close to those reasons he or she has been interviewed in the first place, it is relevant information.

2. As several people have noted, the key part of the talk is not the presentation—though I have, in fact, seen people screw up at the presentation level, albeit usually with a research design that has some stunningly obvious flaw—but the questions, and these can be very revealing. The frequency with which the candidate lost the position—either outright or, more commonly, someone else simply did a better job: remember, you may not have done anything wrong, rather someone else may have done better [3]—due to the questions  is probably two or three times higher than the next most frequent reason.[4]

The University of Kansas has—or had: things may have changed—an absolutely devastating approach to candidate questions. A certain category of candidates typically come to interview at Kansas expecting to find cows grazing in front of the [small] library and a barely literate faculty—they did not, apparently, bother to check where the faculty had obtained their degrees. So the candidates would tend to be caught off-guard by the level of the questions and—I will use the technical term here—start to bullshit an answer.

At many institutions, the response to such activity is that a few designated pit bulls, either senior faculty who do this sort of thing when they aren’t engaged in alternative amusements such as pulling wings off flies, tossing kittens into wood chippers and the like, or the junior faculty desperate to stomp someone into the ground in hopes this will somehow qualify them for tenure, go after the candidate, questioning their training, their overall cognitive ability, the civil status of the candidate’s parents at the point the candidate was conceived, and similar issues.[9] These rants typically go on for some time, giving the candidate a breather, who then lets the awkward topic die out with some comment like “Well, yes, I suppose that’s another theoretical perspective/nuanced social construction…” and goes on to something else. Whew.

That’s not what we would do at Kansas. When the candidate started bullshitting, we’d quietly nod and ask them to elaborate, and see just how far they’d go before they’d realize they were making a complete fool of themselves.

Life can be cruel.

3. The job talk is not an artificial exercise. Someone who cannot put together a carefully rehearsed one-off presentation for 45 minutes in their area of specialization is very unlikely to be able to pull off six to nine hours a week of newly-prepared lectures for fourteen consecutive weeks. And teaching pays the bills.

Forward through time and there are a surprising number of occasions where someone is presenting something that looks a lot like a job talk. To take an extreme example—and of course I’m very proud of this, and I’m not pretending this is typical—I was once asked to give a talk to a very high level audience—a couple of two-stars and one three-star, as I recall, plus a scientific advisory committee [5]—at the Command and General Staff College at Ft. Leavenworth, and was told it would be twenty minutes. The speaker following me had a flight delay and I ended up talking for more than hour, essentially doing a review of quantitative approaches to political analysis. Extreme example, but it can happen.

4. In my experience, the quality of contemporary job talks borders on the intimidating, the sort of thing that leaves senior faculty saying “I’m glad I’m not on the market.” Most of those I saw by candidates at both Kansas and Penn State have been extraordinarily polished and professional, and the standards have risen substantially over the past two decades. At Penn State, no student goes out for an interview without at least two full-scale practice talks, and if it takes more, we do more. Based on the quality of most talks we see, I assume that other programs do the same, and if you are in a program that doesn’t, you are at a serious disadvantage.

The system is not perfect: obviously some people are simply better presenters than others [6], native (or fluent) speakers of English have a clear edge, and I’ve known some cases where individuals are highly qualified but were sufficiently introverted that they didn’t make a good impression [7]. But I don’t see an obvious alternative, and I don’t think the existing system is that dysfunctional.

Back to dealing with that enthusiastic R&R from APSR. [8]

Notes:

1. Pronounced “feeeee-nom”

2. Who also has the other relevant blogy links, which I won’t copy at the moment.

3. We call this “life.”

4. #2: Making it clear to the assistant professors and graduate students that you are way too qualified for the position and the department is lucky you even came out for the interview. Such folks usually end up as baristas, though their stories are long repeated in departmental lore.

5. You get a quirky little medallion for this, the same sort they used to give Native American leaders at the end of treaty talks on the annexation of, say, South Dakota.

6. I’m guessing debate experience helps. Improvisational theater experience—formal or just street theater—probably really helps. Seriously.

7. Class background also can be a factor. Albeit in my favorite instance of this, the individual—a former pipefitter—went on to a highly successful nonacademic political science career and more than once has been courted by institutions which would not give him the time of day while he was on the market to hire their students.

8. A joke, a joke!!!

9. Those wolf-like creatures chasing the dwarfs in Hobbit: The Movie?—those are the assistant professors. The orcs?—those are the vengeful fulls. You?—probably Thorin after about the third time he’s whacked by Azog’s mace.

Posted in Uncategorized | 5 Comments