Advice to involuntarily remote workers from someone with [almost] seven years of remote experience

As I’ve alluded to at various points—see here, here, and here—I have been working remotely since leaving academic life almost seven years ago. I had, in fact, been planning an entry on how I believe remote work is going to have substantial—and generally quite positive—social and economic effects but now, out of a most unexpected corner, comes the entry of millions of people, almost all involuntarily, into remote work. So something less abstract seems in order.

Before—or as an alternative to—going through my suggestions, avail yourselves of the increasing large literature on this, most of it fairly consistent: for example this and this and certainly this and more generally everything you can find of interest under the “Resources” tab here, And get on the https://weworkremotely.com/  email list for ever more links. Ignorance is no excuse: this approach has been developing rapidly over the past decade.

The points below are listed roughly in the order of priority, though of course I expect you will read the whole thing since you’ve got plenty of time and no one is looking over your shoulder at what you are reading, right? You hope: see points 3 and 4.

1. Loneliness and isolation are likely to be your biggest problem

One recent article—link lost, alas—I read said that in 2020, we’ve essentially solved all of the problems of remote work except one: loneliness and isolation. This invariably is rated as the most important downside by remote workers—see here and here—even those who otherwise thoroughly embrace the approach. Be very, very aware of it.

It is not inevitable—well, no more so than you encounter (and your degree of comfort with) solitude/loneliness in other parts of your life—but for those who are suddenly and involuntarily remote, I’m guessing the issue will quickly become a serious public mental health issue. Newspapers are already full of articles on “Telecommuting really sucks!” Like after about three days.

As with almost every point in this essay, the approach for dealing with loneliness will vary dramatically with the individual. The INTJ types of the data science world will, often as not, find the transition fairly easy, and largely positive, though it is still a transition. [1] The ESFPs without a good work-life balance will wonder what befell them.

For a start, however, take the following observation: If you are familiar with traditional rural communities where homes are widely spread apart and mechanized agriculture is largely a solitary pursuit, you will also be familiar with the little cafes—they probably have espresso machines now—where every morning there are clusters of [usually] men in overalls and caps sharing at least coffee and sometimes breakfast, and plenty of conversation and old jokes, before they head back to a day of work alone on the farm. And beyond that there are little rural bars in the middle of nowhere that are packed with cars on the weekends, and there are little churches one knows literally from cradle to grave [2], and there are active parent-teacher associations, and in the old days, various fraternal organizations: all the institutions Bob Putnam described in Bowling Alone that decayed with the suburbanization of post-industrial society. These situations were not ideal and can be too-easily romanticized—like on fabulously successful public radio programs—but are an evolved response to what could otherwise have been a much more lonely life. Not one of these is a co-working space.

2. Togetherness may well be your second biggest problem

When you reach my age and watch people retire, a very common issue is the couple who were very happy and well-adjusted when they spent most of the daylight hours in a workplace with other people, and go batshit crazy when they are together 24/7. Some find useful ways around this, typically through community volunteer work, but others divorce, and still others continue in lives of quiet desperation and/or addiction. [3]

If you are sharing space with another person, whether in a committed relationship or even just out of convenience, are you suddenly in this world. Possibly with children in the mix as well. I’ve no personal advice on this, as both my independently-employed wife and I have our own offices, but on the positive side—as much of David Brooks’s writing in recent years, such as his auspiciously timed curent article in The Atlantic, has noted—during most of human existence, we’ve worked day after day in the presence of the same group of people, and clearly have evolved the social, cultural, and cognitive tools to cope with this.[4] Even if several generations of Fredrick Taylor- and Alfred Sloan-inspired managers have done everything in their power to adapt humans, or some shadow thereof, to the conditions of Lancashire cotton mills in the 1820s, even—or particularly—if the workspace is an open office configuration of a unicorn tech company in the 2020s.

3. Schedule your in-work downtime: you need it

In my previous entry, I mentioned the issue of deep work and the fact that it is tiring and consequently in limited supply.[5] Let me generalize this: as you transition from a working environment where there are constant interruptions to one with no interruptions [6], you need to systematically, not randomly, provide the downtime for yourself. 

People who have always worked in a busy office environment miss this: they figure “wow, I’ve got 100% control of my time!” and think that means they will be working optimally for that 100%. For a while, yes, you might, particularly if there is some really neat project you’ve been waiting a long time to find time for.  (Though conversely, you might be dazzled and confused by the new situation from Day 1 and watch your productivity plummet.) But this burst of productivity won’t last indefinitely. And at that point, you need a plan. [7]

Once again, there are probably as many ways to deal with this as there are personalities, but you need to take into consideration at least the following

  • What are your optimal times of day for doing your best work?: protect these [8]
  • How long—be realistic—can you sustain productive work on various tasks you need to do? (this will vary a great deal depending on the task)
  • What type of break from work is most effective and can be done on a regular basis? 

It took me a while to realize the importance of this, and in the absence of systematic breaks, I’d fall into these unpredictable but sometimes extended periods of procrastination, made even worse as now I’m surrounded by technologies insidiously designed to distract me, when I really should have just gone for a walk. So now I just go for a walk, or two or three walks during the day. My doctor, meanwhile, is thrilled with this behavior. 

That’s me: there are plenty of other alternatives, but just make sure they refresh you and the time you spend on them is more or less predictable: Predictability, as in “getting back to work,” is an advantage of walking or running, or a class at a gym or yoga studio, or going to a coffee shop to make a purchase (watch the accumulating calories. And your A1C results). Predictability is most decidedly not a characteristic of YouTube or browsing social media.

4. Be very suspicious of any software or hardware your employer wants in your home

I’m already seeing articles—typically in “Business” sections which presumably the hoi polloi are not expected to read—from managers confidently asserting “I’m okay with our people working remotely, because our software records every keystroke they enter and every web page they visit! [maniacal laughter]” These articles are not [all] from Chinese sources. Mr. Orwell, 1984 is calling, and not the one where UVA made the Final Four.

If you are in a corporate environment, I would suggest being very suspicious of any unconventional software your employer wants you to install on your own computer[s]—I’d be inclined to refuse this if such autonomy is possible—and any corporate-configured hardware you bring home. Not insanely paranoid: Faraday cages are probably overkill, though a soundproof box with a lid may not be. Same with masking tape over the camera when it is not in use.[10] And don’t think about what your loveable boss might install: think about that creepy guy in tech support.[11]

Enough said. Though I’m guessing we will start seeing stories about unpleasant experiences along these lines in the near future.

5. Use video conferencing. And the mute option.

I’m a big fan of video conferencing, and most definitely was not a fan of audio-only teleconferences. However, there are effective and ineffective ways to do this. There seems to have developed a fairly high consensus in the remote-work world on best-practices, and at the top of the list:

  • Unless there are bandwidth issues, video is on for everyone for the entire meeting
  • Everyone is connecting from their office computer: meetings where half the group is sitting in a conference room (and thus not really remote) are a disaster
  • Stay on mute unless you are talking [12]. And be sure to turn mute back on after you stop: many embarrassing stories devolve on failures to do this. [13]

I’ve been doing fine—well, no one has complained—with the built-in mic [14] and camera on my computers (an iMac and a MacBook Air), though many people recommend buying a good camera and mic separately to get good quality. I use over-ear bluetooth headphones; others are content with wired or bluetooth earbuds.

The one thing that took me quite some time to get right was video lighting levels: contemporary cameras can make do with remarkably little light, but the results do not necessary look pleasant. I generally just use natural light in my office, which has high windows, and it took quite a bit of experimenting, and purchasing an additional desk light I use almost exclusively when I’m doing video, to get things so I don’t appear to be auditioning for a B-grade monster movie.

Sharing desktops and presentations remotely introduces another level of complexity—and for screen-sharing, still more opportunities for embarrassing experiences—and frankly I’d stick with tried-and-true software for doing this—the likes of Zoom and Hangout—not something the boss’s cousin Jason’s start-up just released. [15] Alas, this  involves installing software that accesses your mic and camera: we must be cautious. If you are a large company (or government agency, for godsakes), pay the subscriptions for the fully-functional versions of the damn software! [16]

6. Dedicated space if you can find it

After a brief and unintentional experiment with working from home, I’ve always had a separate office, four in total, two of which I was very happy with (including where I am currently writing this and I’ve now been almost five years), one which was too socially isolated, even for me, and one in a co-working situation which did not work out (but fortunately I was renting that by the month).[17]

But I’m the exception here: surveys indicate that by far most remote workers do so from home—though usually from dedicated space, the suburban above-garage “bonus room” and/or “guest room” being favorites—and, presumably, working from home will be the short-term solution for most people who are involuntarily remote. [18]

Which, like the loneliness/togetherness issue, is going to take a lot of individual adaptation and the primary thing I advise is reading the blogs and other materials from experienced remote workers to get ideas. But working from the dining room table and/or the couch will get very tiresome very quickly, on many different dimensions, as we are already seeing in assorted first-person accounts/diatribes. 

Literally as I was composing this, and quite independently, one of the folks in our CTO group posted to our Slack channel what his company, in addition to cancelling all travel until 1-Aug-2020, is providing for their newly remote workers:

All employees who need to work remotely are authorized to spend $1,000 outfitting their home for remote work. For example, if you do not currently have a comfortable headset with a microphone, or a chair and desk that you can sit in, you should get one. We trust you to use this budget judiciously.

The point on chairs is critical: your dining room chair will kill your butt, and your couch will kill your lower back. 

The temporary—and worse, unpredictably temporary—nature of these involuntary transitions to remote work is quite problematic: most regular small office spaces (if you can find them at a fair price) require a lease of at least a year, though you might be able to find something monthly, and a lot of spaces that could be easily adapted in pre-internet days—many a successful novel has been written in a converted little garden shed in the back corner of a property—run into issues with the need for internet access—though as we’ve all noticed from seeing our neighbor’s printer as an option for our connections, wireless has quite quite an extended reach now [19]—and may require more electrical outlets than may be prudent from an extension cord. [20]

7. Now is a very good time to assess your work-life balance

One of the best articles I’ve read recently—alas, I’ve misplaced the link—on the advantages of remote work emphasized that no, the people you work with may not be the best group of people to socialize with, and if your company is trying to persuade you that they are, and is trying to merge the domains of work and play, you are probably being exploited. This is not to say you can’t have friends at work, but if these are your only friends—they have been chosen for you by HR, eh?—you are in a vulnerable situation. And don’t forget who HR works for: not you.

Wrapping us neatly back to the opening key: you need a community—”communities”, really, and broadly defined—that goes beyond the workplace, and the re-development of such communities may be one of the major effects of remote work. These take time—for mature adults, easily years to get to a point where there is a deep level of understanding, history, trust, and interdependence—and usually involve an assortment of missteps and experimentation to find what really interests you and binds you with other people but, well, every journey starts with a single step, right? Again, just read Putnam, David Brooks and Arthur Brooks on this.[17] Or talk to your [great?-]grandparents about how things worked in the good-old-days.

 

So, I know a whole lot for you didn’t want this, but you may, like so many long-time remote workers, come to enjoy its many advantages such as the possibility of living in areas with a low cost of living, minimal (or zero) commutes, and competing for employment in a national or international market. Meanwhile, stay safe, don’t believe most of the crap circulating on social media, check on your neighbors, particularly if they are older,  live long and prosper.

Footnotes

1. If these terms are unfamiliar, you are not an INTJ. If folks are correct in arguing that in many organizations, introverts provide most of the value while extraverts take most of the credit, covid-19 may unexpectedly provide one of those “you don’t know who is swimming naked until the tide goes out” moments.

2: Except when they are Protestant and split—10% on profoundly esoteric issues of theology and 90% on soap-opera-grade issues of personality—upon passing Dunbar’s number.

3: Suicide increases dramatically for men in this condition; I will not speculate on the occurrence of homicide and abuse, though I suspect it can also be quite serious.

4. Brooks also makes the interesting observation that self-selected “tribes”—which of course we Boomers figured we invented, just like sex and wild music, as hippies in the 1960s—are historically common based on DNA analyses of ancient burials. 

5. For the past six weeks I’ve been working intensely on a complex set of programming problems—first fruits of this are here—and periodically frustrated that I usually just get in four or five hours of really good work per day. Darn: over the hill. Then checked my logs for a similar project eight years ago during a period of largely unstructured time while on a research Fulbright in Norway: same numbers.

6. This sort of autonomy, of course, doesn’t apply to every job, but it does apply to many that are shifting to the involuntary-remote mode.

7. There’s a great deal of cautionary lore in the academic world on how during sabbaticals—ah, sabbaticals, now a distant memory for the majority of those in academic positions—months could be frittered away before you realized that you hadn’t transitioned to unstructured time, and by then the sabbatical would be almost over. Most decidedly not an urban legend!

8. Based on a discussion last week in our CTO group [9]—very much like those rural cafes except we’re not wearing caps and overalls and there is a mix of genders—the “optimal time” for deep work varies wildly between people, but the key is knowing when it is, and if you can control when meetings are scheduled, do this during your down time, not your creative time.

9. I’m locally an honorary CTO based on my past experience with project management. We meet monthly, not daily, and I learn a great deal from these folks, who are mostly real CTOs working for companies with revenues in the $1M to $100M range. Few of which you’ve heard of, but these are abundant in Charlottesville. Bits of their wisdom now goes into this blog.

10. Audio: if Alexa or Siri are already in your home, that horse has left the barn. A stampede of horses.

11. Look, I am fully aware that remote security issues are real: I’ve worked remotely on multiple projects where our most probable security threats were at the nation-state level—and nation-states that are rather adept at this sort of thing—and countering that is a pain, and my PMs could tell you that I was not always the most patient or happiest of campers about the situation, though after a while it becomes routine. But we did this—successfully as far as I know—with well-known, standard open tools on the client side (and generally the server side as well), and current industry best-practices, not recommendations dating to high school. This is a totally different situation than being asked to install unknown software acquired by IT after a pitch by some fast-talking sleazeball over drinks at a trade show in Vegas: you don’t want that stuff in your home.

12. I have endless stories of attempted audio connections going badly, though my favorite remains someone attempting to give a presentation by audio while parked next to a railroad and then one of those multi-mile-long trains came by. Experienced readers of this blog will be shocked, shocked to learn this occurred in the context of a DARPA project.

13. Though with video, we are no longer treated to the once-common experience of someone forgetting to mute and soon transmitting the unmistakable sounds of a restroom.

14. microphone

15. Had a really bad experience on those lines a few months back…though it was with an academic institution and they were probably trying to save money. But I do not completely dismiss the possibility of cousin Jason’s startup.

16. Oh, if I only had a lifetime collection of out-takes of bad remote presentation experiences, mostly with government agencies and institutions with billions of dollars in their budgets. A decade—well, it seemed like a decade—of the infamous Clippy. Suggestions for software updates that refused to go away. Advertising popping up for kitchen gadgets. Though at least it wasn’t for sex toys. Multi-million-dollar bespoke networking installations that crashed despite the heroic efforts of on-site tech support and we were reduced to straining to hear and speak to a cell phone placed forlornly on the floor in the middle of a conference room.

17.  My costs, for 200 sq ft (20 sq m), have consistently been around $5000/year, which The Economist reports to be the average that corporations spend per employee on space. Though guessing most of those employees don’t have 200 sq ft. Nor a door, windows, or natural light. 

18. It will be curious to see what involuntary remote work does for co-working spaces: if these have sufficient room that one can maintain a reasonable distance, they would not necessarily be a covid-19 hazard, and may be the only short-term alternative to working from the kitchen table. But they do involve mingling with strangers. Assuming one is okay with the distractions of co-working spaces in the first place: I’m not. All that said, there are probably a whole lot of people happy now that they never had the opportunity to buy into WeWork’s IPO.

19. Reliable internet is an absolute must, particularly for video conferencing but even in day-to-day work if you are constantly consulting the web. The internet in my office has been gradually and erratically deteriorating, presumably due in part to unmet bandwidth issues (thanks, CenturyLink…) and it can be really annoying.

20. I have this dream of the vast acreage of now-defunct shopping centers—a major one here just gave up the ghost last week—being redeveloped as walkable/bikeable mixed-use centers with offices (not co-working spaces) in a wide variety of sizes oriented to individuals and small companies doing remote work: just having people around and informal gathering spaces—remember those rural cafes—goes a long way to solving the isolation issue. But that’s not going to happen in the next couple of months.

21. And give her credit, Hillary Clinton

Posted in Uncategorized | Leave a comment

Seven reflections on work—mostly programming—in 2020

Reading time: Uh, I dunno: how fast do you read? [0]

Well, it’s been a while since any entries here, eh? Spent much of the spring of 2019 trying to get a couple projects going that didn’t work out, then most of the fall working intensely on one that did, and also made three trips to Europe this year: folks, that’s where the cutting edge has moved on instability forecasting. And on serious considerations of improving event data collection: please check out https://emw.ku.edu.tr/aespen-2020/: Marseille in early summer, 4 to 8 page papers, and an exclusive focus on political event data!

All distractions, and what has finally inspired me to write is an excellent blog entry—found via a retweet by Jay Ulfelder then re-retweeted by Erin Simpson, this being how I consume the Twittersphere—by Ethan Rosenthal on remote work in data science:

https://www.ethanrosenthal.com/2020/01/08/freelance-ds-consulting/

While Rosenthal’s experience—largely working with private sector start-ups, many retail—would seem quite distant from the sort of projects Jay and I have usually worked on (sometimes even as collaborators), Jay noted how closely most of the practical advice paralleled his own experience [1] and I found exactly the same, including in just the past couple of  months:

  • Desperately trying to persuade someone that they shouldn’t hire me
  • Doing a data audit for a proposed project to make sure machine-learning methods had a reasonable chance of producing something useful
  • Pipelines, pipelines, pipelines
  • The importance and difficulties of brutally honest estimates

and much more. Much of this is consistent with my own advice over almost seven years of remote contracting—see here, here, and here—but again, another view from a very different domain, and with a few key differences (e.g. Rosenthal works in Manhattan. New York, not Kansas).

And having decided to comment on a couple of core points from Rosenthal, I realized there were some other observations since the spring of 2019—yes, it has been that long—and came up with the requisite seven, and meanwhile my primary project is currently on hold due to issues beyond my control involving a rapacious publisher in an oligopoly position—things never change—so here we go…

Deep work is a limited and costly resource

Rosenthal has one of the best discussions of the nuances of contractor pricing that I’ve seen. Some of this covers the same ground I’ve written on earlier, specifically that people on salary in large organizations—and academia is probably the worst offender as they rarely deal with competitive pricing or any sort of accountability [2], but people whose life experience has been in government and corporations can be just as clueless—have no idea whatsoever of what their time actually costs and how much is totally wasted. Rosenthal echoes the point I’ve made several times that unless you carefully and completely honestly log your time—I’ve done so, for decades, at half-hour increments, though I still have difficulties with the “honestly” part, even for myself—you have no idea how much time you are actually working productively. People who claim to do intellectually demanding “work” for 80 hours a week are just engaging in an exercise of narcissistic self-deception, and if you estimate level-of-effort for a project in that mindset, you are doomed. 

Where Rosenthal’s discussion is exceptional—though consistent with a lot of buzz in the remote-work world of late—is distinguishing between “deep” and “shallow” work and arguing that while “deep” work should be billed at a high rate—the sort of rate that causes academics in particular to gasp in disbelief—you can’t do it for 40 hours a week (to say nothing of the mythical 80-hours-per-week), and you are probably lucky to sustain even 30 hours a week beyond occasional bursts.[3] So, ethically, you should only be charging your top rate when you are using those deep skills, and either not charge, or charge at a lower rate, when you are doing shallow work. My experience exactly. 

Deep work can be really exhausting! [6] Not always: in some instances, when one has a task that fits perfectly into the very narrow niche where the well-documented and much-loved “flow” experience occurs, it is exhilarating and time flies: you almost feel like you should be paying the client (no, I don’t…). But, bunkos, in real work on real problems with real data, most of your time is not spent in a “flow” state, and some of it can be incredibly tedious, while still requiring a high skill set that can’t really be delegated: after all, that’s why the client hired you. In those instances, you simply run out of energy and have to stop for a while. [7]

Rosenthal also argues for the rationality of pricing by the project, not by the hour, particularly when working on software that will eventually be transferred to the client. The interests of client and contractor are completely aligned here: the client knows the cost in advance, the contractor bears the risk of underestimating efforts, but also has greater knowledge about the likely level of effort required, and the contractor has incentives to invest in developments that make the task as efficient as possible, which will then eventually get transferred (or can be) to the client. There’s no downside! 

Yet it’s remarkably hard to get most clients—typically due to their own bureaucratic restrictions—to agree to this, due to most organizations still having a 19th century industrial mindset where output should be closely correlated with time spent working. [8] Also for some totally irrational reason—I suppose a variant on Tversky and Kahneman’s well-researched psychological “loss-aversion”—project managers seem to be far more concerned that the contractor will get the job done too quickly, thus “cheating” them on a mutually-agreed-upon amount, while ignoring the fact the otherwise they’ve given the contractor zero incentive to be efficient. [9] Go figure. 

Remote work is cool and catching on

I’ve worked remotely the entire time I’ve been an independent contractor, so it’s fascinating to watch this totally catching on now: The most common thing I now hear when talking with CTOs/VPs-of-Engineering in the Charlottesville area is either that their companies are already 100% remote, or they are heading in that direction, at least for jobs involving high-level programming and data analytics. The primary motivator is the impossibility of finding very idiosyncratic required skill sets locally, this being generally true except in three or four extraordinarily expensive urban areas, and often not even there.

But it is by no means just Charlottesville or just computing, as two recent surveys illustrate:

https://usefyi.com/remote-work-report/

https://buffer.com/state-of-remote-work-2019

While there are certainly tasks which don’t lend themselves to remote work, I’ll be curious to see how this finally settles out since we’re clearly early in the learning curve. [10]

Three observations regarding those surveys:

  1. The level of satisfaction—noting, of course, that both are surveying people doing remote work, not the entire workforce—is absolutely stunning, in excess of 90%: it’s hard to think of any recent workplace innovation that has had such a positive reception. Certainly not open office plans!
  2. I was surprised at the extent to which people work from home [10a], as I’ve argued vociferously for the importance of working in an office away from home. At least three things appear to account for this difference: First, flexibility in childcare is a major factor for many remote workers that is not relevant to me. Second, I’m doing remote work that pays quite well, and the monthly cost of my cozy little office is covered in my first three or four hours of deep work, which would not be true for, say, many editing or support jobs. Third, from the photos, a lot of people are in large suburban houses, probably with a so-called “bonus room” that can be configured as customized workspace, whereas my residence is in an older urban neighborhood of relatively small mid-20th-century houses.
  3. People are appreciating that remote work can be done in areas with relatively low real estate prices and short commuting times: my 1-mile “commute” is about 20 minutes on foot and 5 minutes on a Vespa, with further savings in our family owning just one car. If remote work continues to expand, this may have discernible redistributive effects: as The Economist notes on a regular basis, the high professional salaries in urban areas are largely absorbed by [literal] rents, and since remote work is generally priced nationally, and sometimes globally, there is nothing like getting Silicon Valley or Northern Virginia wages while living outside those areas. [11] This is apparently leading to a revival of quite a few once-declining secondary urban centers, and in some instances even rural areas, where the money goes a really long way.

All this said, your typical 19th-century-oriented [typically male] manager does not feel comfortable with remote work! They want to be able to see butts on seats! And at least 9-to-5! This is frequently justified with some long-reimagined story where they assisted a confused programmer with a suggestion [12], saving Earth from a collision with an astroid or somesuch, ignoring that 99% of the time said programmer’s productivity was devastated by their interruptions. But managerial attitudes remain “If it was good enough for Scrooge and Marley, it’s good enough for me.” Still a lot of cultural adaptation to be done here.

The joy of withered technologies 

From David Epstein. 2019. Range: Why generalists triumph in a specialized world. NY: Riverhead Books, pg. 193-194, 197. [12a]

By “withered technology”,  [Nintendo developer Gunpie] Yokoi meant tech that was old enough to be extremely well understood and easily available so it didn’t require a specialist’s knowledge. The heart of his “lateral thinking with withered technology” philosophy was putting cheap, simple technology to use in ways no one else considered. If he could not think more deeply about new technologies, he decided, he would think more broadly about old ones. He intentionally retreated from the cutting edge, and set to monozukuri [“thing making”]. 

When the Game Boy was released, Yokoi’s colleague came to him “with a grim expression on his face,” Yokoi recalled, and reported that a competitor’s handheld had hit the market. Yokoi asked him if it had a color screen. The man said it did. “Then we’re fine.” Yokoi replied.

I encountered this over the past few months when developing customized coding software for the aforementioned data collection project. While I certainly know how to write coding software using browser-based interfaces—see CIVET, as well as a series of unpublished customized modules I created for coding the Political Instability Task Force Worldwide Atrocities Dataset—I decided to try a far simpler, terminal-based interface for the new project, using the Python variant of the old C-language curses library, which I’d learned back in 2000 when writing TABARI’s coding interface.

The result: a coding program that is much faster to use, and probably physically safer, because my fingers never leave the keyboard, and most commands are one or two keystrokes, not complex mouse [13] movements requiring at least my lower arm and probably my elbow as well. Thus continuing to avoid—fingers crossed, but not too tightly—the dreaded onset of carpal tunnel syndrome which has afflicted so many in this profession.

And critically, the code is far easier to maintain and modify, as I’m working directly with a single library that has been stable for the better part of three decades, rather than the multiple and ever-changing layers of code in modern browsers and servers and the complex model-template-view architectural pattern of Django, as well as three different languages (Python, php and javascript). Really, I just want to get the coding done as efficiently as possible, and as the system matured, the required time to code a month of data dropped almost in half. Like Yokoi, frankly I don’t give a damn what it looks like.

Just sayin’…and we can generalize this to…

The joy of mature software suites: there is no “software crisis”

We have a local Slack channel frequented mostly by remote workers (some not local) and in the midst of the proliferation of “so, how about them 2010s?” articles at the end of 2019, someone posted a series of predictions made on the Y-Combinator Slack-equivalent back in 2010.

Needless to say, most of these were wrong—they did get the ubiquity of internet-enabled personal information devices correct, and some predictions are for technologies still in development but which will likely happen fairly soon—making the predictable errors one expect from this group: naive techno-optimism and expectation of imminent and world-changing “paradigm shifts,” and consistently underestimating the stability of entrenched institutions, whether government, corporate—the demise, replacement, or radical transformation of Microsoft, Facebook, Google, and/or Twitter was a persistent theme—technological or social.[14] But something caught my attention

…in the far future, software will be ancient, largely bug free, and not be changed much over the centuries. Information management software will evolve to a high degree of utility and then remain static because why change bug free software that works perfectly.  … What we think of programming will evolve into using incredible high level scripting languages and frameworks. Programs will be very short.

This hasn’t taken anything close to centuries because in statistics (and rapidly developing, machine-learning), whether R or the extensive Python packages for data analytics and visualization, that’s really where we are already: these needs are highly standardized so the relevant code—or something close enough [15]—is already out there with plenty of practical use examples on the web, so the scripts for very complex analyses are, indeed, just a couple dozen lines.

What is remarkable here—and I think we will look back at the 2010s as the turning point —is that we’ve now evolved (and it was very much an organic evolution, not a grand design) a highly decentralized and robust system for producing stable, inexpensive, high quality software that involves the original ideas generally coming from academia and small companies, then huge investments by large corporations (or well-funded start-ups) to bring the technology to maturity (including eventually establishing either formal or de facto standards), all the while experiencing sophisticated quality control [17]  and pragmatic documentation (read: Stack Overflow). This is most evident at the far end of the analytical pipeline—the aforementioned data analytics and visualization—but, for example, I think we see it very much at work in the evolution of multiple competing frameworks for javascript: this is a good thing, not a bad thing, if sometimes massively annoying at the time. The differences between now and even the 1990s is absolutely stunning.

So why the endless complaints about the “software crisis?” Two things I’m guessing. First, in data analytics we still have, and will always have, a “first mile” and “last mile” problem: at the beginning data needs to be munged in highly idiosyncratic ways in order to be used with these systems, and that process is often very tedious. At the end stages of analysis, the results need to be intelligently presented and interpreted, which also requires a high level of skills often in short supply. And then there’s the age-old problem that most non-technical managers hate skilled programmers, because skilled programmers don’t respond predictably to the traditional Management Triad—anger, greed, and delusion—and at the end of the [working] day, far too many IT managers really just want to employ people, irrespective of technical competence, they will feel comfortable with doing vodka shots in strip clubs. That has only barely changed since the 1990s. Or 1970s.

Whoever has the most labelled cases wins

Fascinating Economist article (alas, possibly paywalled depending on your time and location):

https://www.economist.com/technology-quarterly/2020/01/02/chinas-success-at-ai-has-relied-on-good-data

arguing that the core advantage China has in terms of AI/ML is actually labelled cases, which China has built a huge infrastructure for generating in near-real-time and at low cost, rather than in the algorithms they are using: 

Many of the algorithms used contain little that is not available to any computer-science graduate student on Earth. Without China’s data-labelling infrastructure, which is without peer, they would be nowhere.

Also see this article Andy Haltermann alerted me to: https://arxiv.org/pdf/1805.05377.pdf

Labelled cases—and withered technologies—become highly relevant when we look at the current situation for the automated production of event data. All of the major projects in the 2010s—BBN’s work on ICEWS, UT/Dallas’s near-real-time RIDIR Phoenix, UIUC Cline Center Historical Phoenix—use the parser/dictionary approach first developed in the 1990s by the KEDS and VRA-IDEA projects, then followed through to the TABARI/CAMEO work of the early 2000s. But seen from the perspective of 2020, Lockheed’s successful efforts on the original DARPA ICEWS (2008-2011) went with a rapidly-deployable “withered technology”—TABARI/CAMEO—and initially focused simply on improving the news coverage and actors dictionaries—both technically simple tasks—leaving the core program and its algorithms intact, even to the point where, at DARPA’s insistence, the original Lockheed JABARI duplicated some bugs in TABARI, only later making some incremental improvements: monozukuri + kaizen. Only after the still-mysterious defense contractor skullduggery at the end of the research phase of ICEWS—changing the rules so that BBN, presumably intended as the winner in the DARPA competition all along, could now replace Lockheed—was there a return to the approach of focusing on highly specialized coding algorithms.

But that was then, and despite what I’ve written earlier, probably the Chinese approach—more or less off-the-shelf machine learning algorithms [18], then invest in generating masses of training data (readily available as grist for the event data problem, of course)—is most appropriate. We’ll see. 

David Epstein’s Range: Why generalists triumph in a specialized world is worth a read

Range is sort of an anti-Malcolm-Gladwell—all the more interesting given that Gladwell, much to his credit, favorably blurbs it—in debunking a series of myths about what it takes to become an expert. The first of two major take-aways—the book is quite wide-ranging—are that many of the popular myths are based on expertise gained in “kind” problems where accumulated past experience is a very good guide to how to get a positive outcome in the future: golf, chess, and technical mastery of musical instruments being notoriously kind cases.[19] In “wicked” problems, concentrated experience per se isn’t much help, and, per the title of the book, generalists with a broad range of experience and experimentation in many different types and levels of problems excel instead.

The other myth Epstein thoroughly debunks is the “10,000-hours to expertise” rule extolled by Gladwell. For starters, this is largely an urban legend with little systematic evidence to back it.  And in the “well, duh…” category, the actual amount of time required to achieve expertise depends on the domain—starting with that kind vs. wicked distinction—and the style of the experience/training (Epstein discusses interesting work on effects of mixing hard and easy problems when training), and on the individual: some people absorb useful information more quickly than others.

So where is programming (and data analytics) on this perspective? Curiously with aspects on both ends. Within a fixed environment, it is largely “kind”: the same input will always produce the same output [20]. But the overall environment, particularly for data analytics in recent years, is decidedly wicked: while major programming languages change surprisingly slowly, libraries and frameworks change rapidly and somewhat unpredictably, and this is now occurring in analytics (or at least predictive analytics) as well, with machine-learning supplanting—sometimes inappropriately—classical statistical modeling (which by the 1990s had largely degenerated to almost complete reliance on variants of linear and logistic regression [16]) and rapid changes can also occur in machine-learning, as the rapid ascendency of deep learning neural networks has shown.  

As for what this means for programmers, well…

The mysteries of 1000-hour neuroplasticity

I’ll finish on a bit of a tangent, Goleman and Davidson’s Altered Traits: Science Reveals How Meditation Changes Your Mind, Brain, and Body (one hour talk at Google here).

Goleman and Davidson are specifically interested in meditation methods that have been deliberately refined over millennia to alter how the brain works in a permanent fashion: altered traits as distinct from temporarily altered states in their terminology, and these changes now can be consistently measured with complex equipment rather than self-reporting. But I’m guessing this generalizes to other sustained “deep” cognitive tasks.

What I find intriguing about this research is what I’d call a “missing middle”: There is now a great deal of meditation research on subjects with very short-term experience—typically either secular mindfulness or mantra practice—involving a few tens of hours of instruction, if that, followed by a few weeks or at most months of practice of varying levels of consistency. Davidson, meanwhile, has gained fame for his studies, in collaboration with the Dalai Lama, on individuals, typically Tibetan monastics, with absolutely massive amounts of meditation experience, frequently in excess of 50,000 lifetime hours, including one or more five-year retreats, and intensive study and training.[21]

My puzzle: I think there is a fair amount of anecdotal evidence that the neuroplasticity leading to “altered traits” probably starts kicking in around a level of 1,000 to 2,000 lifetime hours of “deep” work, and this probably occurs in a lot of domains, including programming. But trying to assess this is complicated by at least the following issues

  • reliably keeping track of deep practice over a long period of time—a year or two at least, probably more like five years, since we’re looking at time spent in deep work, not PowerPoint-driven meetings or program/performance reviews [22]—and standardizing measures of its quality, per Epstein’s observations in Range
  • standardizing a definition of “expertise”: We all know plenty of people who have managed for decades to keep professional jobs apparently involving expertise mostly by just showing up and not screwing up too badly too conspicuously too often
  • Figuring out (and measuring for post-1000-to-2000-hour subjects) baselines and adjusting for the likely very large individual variance even among true experts
  • Doing these measures with fabulously expensive equipment the validity of which can be, well, controversial. At least in dead salmon.

So, looking at what I just wrote, maybe 1000 to 2000 hour neuroplasticity, if it exists, will remain forever terra incognita, though it might be possible in at least a few domains where performance is more standardized: London taxi drivers again.[reprise 21] But I wonder if this addresses an issue one finds frequently in fields involving sustained mental activity, where a surprisingly high percentage of elaborately-trained and very well compensated people drop out after five to ten years: Is this a point where folks experiencing neuroplasticity—and learning how to efficiently use their modified brains, abandoning inefficient habits from their period of learning and relying more on a now-effective “intuition,” setting aside the proverbial trap to focus on the fish—find tasks increasingly easy, while those who have not experienced this change are still tediously stumbling along, despite putting in equivalent numbers of hours? Just a thought. So to speak.

Happy New Year. Happy 2020s.

Footnotes

0. And about all those damn “footnotes”…

1. And transcends the advice found in much of the start-up porn, which over-emphasizes the returns on networking and utilizing every possible social media angle. Rosenthal does note his networks have been great for locating free-lance jobs, but these were networks of people he’d actually worked with, not social media networks. 

2. By far the worst experience I’ve had with a nominally full-time—I momentarily thought I’d use the word “professional,” but…no…—programmer I was supposedly collaborating with—alas, with no authority over them—was in an academic institution where the individual took three months to provide me with ten lines of code which, in the end, were in a framework I decided wouldn’t work for the task, so even this code was discarded and I ended up doing all of the coding for an extended project myself. The individual meanwhile having used that paid time period to practice for a classical music competition, where they apparently did quite well. They were subsequently “let go”, though only when this could be done in the context of a later grant not coming through.

As it happens, I recently ran into the now-CTO of that institution and, with no prompting from me, they mentioned the institution had a major problem with a large number of programmers on payroll, for years, who were essentially doing nothing, and quite resented any prospects of being expected to do anything. So it was in the institutional culture: wow! Wonder how many similar cases there are like this? And possibly not only in academia.

3. Note this is one of the key reasons programming death marches don’t work, as Brooks initially observed and, Yourdon later elaborated in more detail. [4] In programming, the time you “save” by not taking a break, or just calling it quits for the day, can easily, easily end up genuinely costing you ten or more times the effort down the road. [5]

4. I gather if I were a truly committed blogger, apparently there are ways I could monetize these links to Amazon and, I dunno, buy a private island off the coast of New Zealand or somesuch. But for now they are just links…

5. As with most engineering tasks but unlike, say, retail. Or, surprisingly, law and finance if their 80-hour work weeks are real. Medicine?: they bury their mistakes.

6. I’m pretty sure Kahneman and Tversky did a series of experiments showing the same thing. Also pretty sure, but too tired to confirm, Kahneman discusses these in Thinking Fast and Slow. (Google talk here )

7. I suppose nowadays taking stimulating drugs would be another response. Beyond some caffeine in the morning (only), not something I do: my generation used/uses drugs recreationally, not professionally. But that may just be me: OK, boomer.

8. Far and away the best managed project I’ve been involved with not only paid by the sub-project, but paid in advance! This was a subcontract on a government project and I was subsequently told on another government subcontract that, no, this is impossible, it never happens. Until I gave up arguing the point, I was in the position of discussions with Chico Marx in Duck Soup: “Well, who you gonna believe, me or your own eyes?” Granted, I think I was involved in some sort of “Skunkworks” operation—it was never entirely clear and the work was entirely remote beyond a couple meetings in conference rooms in utterly anonymous office parks—but still, that pre-paying project went on for about two years with several subcontracts. 

9. The “cost-plus” contracts once [?] common in the defense industry are, of course, this moral hazard on steroids.

10. On a learning curve but definitely learning: one of the fascinating things I’ve seen is how quickly people have settled on two fundamental rules for remote meetings :

  • Everyone is on a remote connection from their office, even if some of the people are at a central location: meetings with some people in a conference room and the rest coming in via video are a disaster
  • Video is on for everyone: audio-only is a recipe for being distracted

These two simple rules go most of the way to explaining why remote meetings work with contemporary technology (Zoom, Hangout) but didn’t with only conference room video or audio-only technology: OMG “speaker phones,” another spawn of Satan or from your alternative netherworld of choice.

10a. So the typical remote worker uses a home office, and tech companies are moving to 100% remote, and yet in downtown Charlottesville there are currently hundreds of thousands of square feet of office space under construction that will be marketed to tech companies: am I missing something here?

11. On the flip side, there is also nothing like getting Mumbai wages while living in San Francisco or Boston.

12. Uh, dude, that’s what Slack and StackOverflow are used for now…

12a. Actual page references evidence that I bought the physical book at the National Book Festival after listening to Epstein talk about it,  rather than just getting a Kindle version.

13. I’ve actually used a trackball for decades, but same difference. Keyboard also works nicely in sardine-class on long flights.

14. One prediction particularly caught my attention: “A company makes a practice of hiring experienced older workers that other companies won’t touch at sub-standard pay rates and the strategy works so well they are celebrated in a Fortune article.” Translation: by 2020, pigs will fly.

15. E.g. I was surprised but exceedingly pleased to find a Python module that was mostly concerned with converting tabular data to data frames, but oh-by-the-way automatically converted qualitative data to dummy variables for regression analysis [16]

16. Yes, I recently did a regression on some data. ANOVA actually: it was appropriate.

17. For all my endless complaints about academic computer science, their single-minded focus on systematically comparing the performance of algorithms is a very valuable contribution to the ecosystem here. Just don’t expect them to write maintainable and documented code: that’s not what computer scientists or their graduate students do.

18. Algorithms from the 2020s, of course, and probably casting a wide net on these, as well as experimenting with how to best pre-process the training data—it’s not like parsing is useless —but general solutions, not highly specialized ones.

19. Firefighting, curiously, is another of his examples of a “kind” environment for learning.

20. If it doesn’t, you’ve forgotten to initialize something and/or are accessing/corrupting memory outside the intended range of your program. The latter is generally all but impossible in most contemporary languages, but certainly not in C! And C is alive and well! Of course, getting different results each time a program is run is itself a useful debugging diagnostic for such languages.

21. Another example of brain re-wiring following intensive focused study involves research on London taxi drivers: Google “brain london taxi drivers” for lots of popular articles, videos etc.

22. If Goleman and Davidson’s conclusions—essentially from a meta-analysis—can be generalized, periods of sustained deep cognitive work, which in meditation occurs in the context of retreats, may be particularly important for neuroplasticity. Such periods of sustained concentration are certainly common in other domains involving intense cognitive effort; the problem would remain reliably tracking these over a period of years. And we’re still stuck with the distinction that neuroplasticity is the objective of most intensive meditation practices, whereas it is an unsystematic side effect of professional cognitive work.

Posted in Methodology, Programming | 1 Comment

Seven current challenges in event data

Click to download PDFThis is the promised follow-up to last week’s opus, “Stuff I Tell People About Event Data, herein referenced as SITPAED. It is motivated by four concerns:

  • As I have noted on multiple occasions, the odd thing about event data is that it never really takes off, but neither does it ever really go away
  • As noted in SITPAED, we presently seem to be languishing with a couple “good enough” approaches—ICEWS on the data side and PETRARCH-2 on the open-source coder side—and not pushing forward, nor is there any apparent interest in doing so
  • To further refine the temporal and spatial coverage of instability forecasting models (IFMs)—where there are substantial current developments—we need to deal with near-real-time news input. This may not look exactly like event data, but it is hard to imagine it won’t look fairly similar, and confront most of the same issues of near-real-time automation, duplicate resolution, source quality and so forth
  • Major technological changes have occurred in recent years but, at least in the open source domain, coding software lags well behind these, and as far as I know, coder development has stopped even in the proprietary domain

I will grant that in current US political circumstances—things are much more positive in Europe—”good enough” may be the best we can hope for, but just as the “IFM winter” of the 2000s saw the maturation of projects which would fuel the current proliferation of IFMs, perhaps this is the point to redouble efforts precisely because so little is going on.

Hey, a guy can dream.

Two years ago I provided something of a road-map for next steps in terms of some open conjectures and additional reflections can be found here and here. This essay is going to be more directed, with an explicit research agenda, along the lines of the proposal for a $5M research program at the conclusion of this entry from four years ago. [1] These involve quite a variety of levels of effort—some could be done as part of a dissertation, or even an ambitious M.A. thesis, others would require a team with substantial funding—but I think all are quite practical. I’ll start with seven in detail, then briefly discuss seven more.

1. Produce a fully-functional, well-tested, open-source coder based on universal dependency parsing

As I noted in SITPAED, PETRARCH-2 (PETR-2)—the most recent open source coder in active use, deployed recently to produce three major data sets—was in fact only intended as a prototype. As I also noted in SITPAED, universal dependency parsing provides most of the information required for event data coding in an easily processed form, and as a bonus is by design multi-lingual, so for example, in the proof-of-concept mudflat coder, Python code sufficient for most of the functionality required for event coding is about 10% the length of comparable earlier code processing a constituency parse or just doing an internal sparse parse. So, one would think, we’ve got a nice opportunity here, eh?

Yes, one would think, and for a while it appeared this would be provided by the open-source “UniversalPetrarch” (UP) coder developed over the past four years under NSF funding. Alas, it now looks like UP won’t go beyond the prototype/proof-of-concept stage due to an assortment of  “made sense at the time”—and frankly, quite a few “what the hell were they thinking???”—decisions, and, critically, severe understaffing. [2] With funding exhausted, the project winding down, and UP’s sole beleaguered programmer mercifully reassigned to less Sisyphean tasks, the project has 31 open—that is, unresolved—issues on GitHub, nine of these designated “critical.”

UP works for a couple of proofs-of-concepts—the coder as debugged in English will, with appropriate if very finely tuned dictionaries, also code in Arabic, no small feat—but as far as I am following the code, the program essentially extracts from the dependency parse the information found in a constituency parse, this approach consistent with UP using older PETR-1 and PETR-2 dictionaries and being based on the PETR-2 source code. It sort of works, and is of course the classical Pólya method of converting a new problem to something you’ve already solved, [8] but seems to be going backwards. Furthermore PETR-1/-2 constituency-parse-based dictionaries [10] are all that UP has to work with: no dictionaries based on dependency parses were developed in the project. Because obviously the problem of writing a new event coder was going to be trivial to solve.

Thus putting us essentially back to square one, except that NSF presumably now feels under no obligation to pour additional money down what appears to be a hopeless rathole. [11] So it’s more like square zero.

Well, there’s an opportunity here, eh? And soon: there is no guarantee either the ICEWS or UT/D-Phoenix near-real-time data sets will continue!!

2. Learn dictionaries and/or classifiers from the millions of existing, if crappy, text-event pairs

But the solution to that opportunity might look completely different from any existing coder, being based on machine-learning classifiers—for example some sort of largely unsupervised indicator extraction based on the texts alone, without an intervening ontology (I’ve seen several experiments along these lines, as well as doing a couple myself)—rather than dictionaries. Or maybe it will still be based on dictionaries. Or maybe it will be a hybrid, for example doing actor assignment from dictionaries—there are an assortment of large open-access actor dictionaries available, both from the PETRARCH coders and ICEWS, and these should be relatively easy to update—and event assignment (or, for PLOVER, event, mode, and context assignment) from classifiers. Let a thousand—actually, I’d be happy with one or ideally at least two—flowers bloom.

But unless someone has a lot of time [12]—no…—or a whole lot of money—also no…—this new approach will require largely automated extraction of phrases or training cases from existing data: the old style of human development won’t scale to contemporary requirements.

On the very positive side, compared to when these efforts started three decades ago, we now have millions of coded cases, particularly for projects such as TERRIER and Cline-Phoenix (or for anyone with access to the LDC Gigaword corpus and the various open-source coding programs) which have both the source texts and corresponding events. [13]  Existing coding, however, is very noisy—if it wasn’t, there would be no need for a new coder—so the challenge is extracting meaningful information (dictionaries, training cases, or both) for a new system, either in a fully-automated or largely automated fashion. I don’t have any suggestions for how to do this—or I would have done it already—but I think the problem is sufficiently well defined as to be solvable.

3. ABC: Anything but CAMEO

As I pointed out in detail in SITPAED, and which is further elaborated in the PLOVER manual and various earlier entries in this blog, despite being used by all current event data sets, CAMEO was never intended as a general-purpose event ontology! I have a bias towards replacing it with PLOVER—presumably with some additional refinements—and in particular I think PLOVER’s proposed event-mode-context format is a huge improvement, both from a coding, interpretation, and analytical perspective, over the hierarchical format embedded in earlier schemes, starting with WEIS but maintained, for example, in BCOW as well as CAMEO.

But, alas, zero progress on this, despite the great deal of enthusiasm following the original meeting at NSF where we brought together people from a number of academic and government research projects. Recent initiatives on automated coding have, if anything, gone further away, focusing exclusively on coding limited sets of dependent variables, notably protests. Just getting the dependent variable is not enough: you need the precursors.

Note, by the way, that precursors do not need to be triggers: they can be short-term structural changes that can only be detected via event data because they are unavailable in the tradition structural indicators reported only on an annual basis and/or national level. For at least some IFMs, it has been demonstrated that at the nation-year level, event measures can be substituted for structural measures and provide roughly the same level of forecasting accuracy (sometimes a bit more, sometimes a bit less, always more or less in the ballpark). While this has meant there is little gained from adding events to models with nation-year resolution, at the monthly and sub-state geographical levels, events (or something very similar to events) are almost certainly going to be the only indicators available.

4. Native coders vs machine translation

At various points in the past couple of years, I’ve conjectured that the likelihood that native-language event coders—a very small niche application—would progress more rapidly than machine translation (MT)—an extremely large and potentially very lucrative application—is pretty close to zero. But that is only a conjecture, and both fields are changing rapidly. Multi-language capability is certainly possible with universal dependency parsing—that is much of the point of the approach—and in combination with largely automated dictionary development (or, skipping the dictionaries all together, classifiers), it is possible that specialized programs would be better than simply coding translated text, particularly for highly-resourced languages like Spanish, Portuguese, French, Arabic, and Chinese, and possibly in specialized niches such as protests, terrorism, and/or drug-related violence.

Again, I’m much more pessimistic about the future of language-specific event coders than I was five years ago, before the dramatic advances in the quality of MT using deep-learning methods, but this is an empirical question. [14]

5. Assessing the marginal contribution of additional news sources

As I noted in SITPAED, over the course of the past 50 years, event data coding has gone from depending on a small number of news sources—not uncommonly, a single source such as the New York Times or Reuters [15]—to using hundreds or even thousands of sources, this transition occurring during the period from roughly 2005 to 2015 when essentially every news source on the planet established a readily-scraped web presence, often at least partially in English and if not, accessible, at least to those with sufficient resources, using MT. Implicit to this model, as with so many things in data science, was the assumption that “bigger is better.”

There are, however, two serious problems to this. The first—always present—was the possibility that all of the event signal relevant to the common applications of event data—currently mostly IFMs and related academic research—is already captured by a few—I’m guessing the number is about a dozen—major news sources, specifically the half-dozen or so major international sources (Reuters, Agence France Presse, BBC Monitoring, Associated Press and probably Xinhua) and another small number of regional sources or aggregators (for example, All Africa). The rest is, at best, redundant because anything useful will have been picked up by the international sources. [16] and/or noise. Unfortunately, as processing pipelines become more computationally intensive (notably with external rather than internal parsing, and with geolocation) those additional sources consume a huge amount of resources, in some cases to supercomputer levels, and limit the possible sponsors of near-real-time data.

That’s the best scenario: the worst is that with the “inversion”—more information on the web is fake than real—these other sources, unless constantly and carefully vetted, are introducing systematic noise and bias.

Fortunately it would be very easy to study this with ICEWS (which includes the news source for each coded event, though not the URL) by taking a few existing applications—ideally, something where replication code is already available—and seeing how much the results change by eliminating various news sources (starting with the extremely long tail of sources which generate coded events very infrequently). It is also possible that there are some information-theoretic measures that could do this in the abstract, independent of any specific application. Okay, it’s not that it might be possible, there are definitely measures available, but I’ve no idea whether they will produce results meaningful in the context of common applications of event data.

6. Analyze the TERRIER and Cline Center long time series

The University of Oklahoma and University of Illinois/Urbana Champaign have both recently released historical data sets—TERRIER and yet-another-data-set-called Phoenix [17] respectively—which vary significantly from ICEWS: TERRIER is “only” about 50% longer (to 1980) but [legally] includes every news source available on LexisNexis, and the single-sourced Cline Center sets are much longer, back to 1945.

As I noted in SITPAED, the downsides of both are they were coded using the largely untested PETR-2 coder and with ca. 2011 actor dictionaries, which themselves are largely based on ca. 2005 TABARI dictionaries, so both recent and historical actors will be missing. That said, as I also showed in SITPAED, at higher levels of aggregation the overall picture provided by PETR-2 may not differ much from other coders (but it might: another open but readily researched question), and because lede sentences almost always refer to actors in the context of their nation-states, simply using dictionaries with nation-states may be sufficient. [18] But most importantly, these are both very rich new sources for event data that are far more extensive than anything available to date, and need to be studied.

7. Find an open, non-trivial true prediction

This one is not suitable for dissertation research.

For decades—and most recently, well, about two months ago—whenever I talked with the media (back in the days when we had things like local newspapers) about event data and forecasting, they would inevitably—and quite reasonably—ask “Can you give us an example of a forecast?” And I would mumble something about rare events, and think “Yeah, like you want me to tell you the Islamic Republic has like six months to go, max!” and then more recently, with respect to PITF, do a variant on “I could tell you but then I’d have to kill you.” [19]

For reasons I outlined in considerable detail here, this absence of unambiguous contemporary success stories is not going to change, probably ever, with respect to forecasts by governments and IGOs, even as these become more frequent, and since these same groups probably don’t want to tip their hand as to the capabilities of the models they are using, we will probably only get the retrospective assessments by accident (which will, in fact, occur, particularly as these models proliferate [20]) and—decades from now—when material is declassified.

Leaving the task of providing accessible examples of the utility of CRMs instead to academics (and maybe some specialized NGOs) though for reasons discussed earlier, doing so obscurely would not bother me. Actually, we need two things: retrospective assessments using the likes of ICEWS, TERRIER, and Cline-Phoenix on what could have been predicted (no over-fitting the models, please…) based on data available at the time, and then at some point, a documentable—hey, use a blockchain!—true prediction of something important and unexpected. Two or three of these, and we can take everything back undercover.

The many downsides to this task involve the combination of rare events, with the unexpected cases being even rarer [21], and long time horizons, these typically being two years at the moment. So if I had a model which, say—and I’m completely making this up!—predicted a civil war in Ghana [22] during a twelve month period after two years, a minimum of 24 months, and a maximum of 36 months, will pass before that prediction can be assessed. Even then we are still looking at probabilities: a country may be at a high relative risk, for example in the top quintile, but still have a probability of experiencing instability well below 100%. And 36 months from now we’ll probably have newer, groovier models so the old forecast still won’t demonstrate state of the art methods.

All of those caveats notwithstanding, things will get easier as one moves to shorter time frames and sub-national geographical regions: for example Nigeria has at least three more or less independent loci of conflict: Boko Haram in the northeast, escalating (and possibly climate-change-induced) farmer-herder violence in the middle of the country, and somewhat organized violence which may or may not be political in the oil-rich areas in the Delta, as well as potential Christian-Muslim, and/or Sunni-Shia religiously-motivated violence in several areas, and at least a couple of still-simmering independence movements. So going to the sub-state level both increases the population of non-obvious rare events, and of course going to a shorter time horizon decreases the time it will take to assess this. Consequently a prospective—and completely open—system such as ViEWS, which is doing monthly forecasts for instability in Africa at a 36-month horizon with a geographical resolution of 0.5 x 0.5 decimal degrees (PRIO-GRID; roughly 50 x 50 km) is likely to provide these sorts of forecasts in the relatively near future, though getting a longer time frame retrospective assessment would still be useful. 

A few other things that might go into this list

  • Trigger models: As I noted in my discussion of IFMs , I’m very skeptical about trigger models (particularly in the post-inversion news environment), having spent considerable time over three decades trying to find them in various data sets, but I don’t regard the issue as closed.
  • Optimal geolocation: MORDECAI seems to be the best open-source program out there at the moment (ICEWS does geolocation but the code is proprietary and, shall we say, seems a bit flakey), but it turns out this is a really hard problem and probably also isn’t well defined: not every event has a meaningful location.
  • More inter-coder and inter-dataset comparison: as noted in SITPAED, I believe the Cline Center has a research project underway on this, but more would be useful, particularly since there are almost endless different metrics for doing the comparison.
  • How important are dictionaries containing individual actors?: The massive dictionaries available from ICEWS contain large compendia of individual actors, but how much is actually gained by this, particularly if one could develop robust cross-sentence co-referencing? E.g. if “British Prime Minister Theresa May” is mentioned in the first sentence, a reference to “May” in the fourth sentence—assuming the parser has managed to correctly resolve “May” to a proper noun rather than a modal verb or a date—will also resolve to “GBRGOV”.
  • Lede vs full-story coding: the current norm is coding the first four or six sentences of articles, but to my knowledge no one has systematically explored the implications of this. Same for whether or not direct quotations should be coded.
  • Gold standard records: also on the older list. These are fabulously expensive, unfortunately, though a suitably designed protocol using the “radically efficient” prodigy approach might make this practical. By definition this is not a one-person project.
  • A couple more near-real-time data generation projects: As noted in SITPAED, I’ve consistently under-estimated the attention these need to guarantee 24/7/365 coverage, but as we transition from maintaining servers in isolated rooms cooled to meat-locker temperatures and with fans so noisy as to risk damage to the hearing of their operators except server operators tend to frequent heavy metal concerts…I digress…to cloud-based servers based in Oregon and Northern Virginia, this should get easier, and not terribly expensive.

Finally, if you do any of these, please quickly provide the research in an open access venue rather than providing it five years from now somewhere paywalled.

Footnotes

1. You will be shocked, shocked to learn that these suggestions have gone absolutely nowhere in terms of funding, though some erratic progress has been made, e.g. on at least outlining a CAMEO alternative. One of the suggestions—comparison of native-language vs MT approaches—even remains on this list.

2. Severely understaffed because the entire project was predicated on the supposition that political scientists—as well as the professional programming team at BBN/Raytheon who had devoted years to writing and calibrating an event coder—were just too frigging stupid to realize the event coding problem had already been solved by academic computer scientists and a fully functioning system could be knocked out in a couple months or so by a single student working half time. Two months turned into two years turned into three years—still no additional resources added—and eventually the clock just ran out. Maybe next time.

I’ve got a 3,000-word screed written on the misalignment of the interests of academic computer scientists and, well, the entire remainder of the universe, but the single most important take-away is to never, ever, ever forget that no computer scientist ever gained an iota of professional merit writing software for social scientists. Computer scientists gain merit by having teams of inexperienced graduate students [3]—fodder for the insatiable global demand by technology companies, where, just as with law schools, some will eventually learn to write software on the job, not in school [4]—randomly permute the hyper-parameters of long-studied algorithms until they can change the third decimal point of a standardized metric or two in some pointless—irises, anyone?—but standardized data set, with these results published immediately in some ephemeral conference proceeding. That’s what academic computer scientists do: they don’t exist to write software for you. Nor have they the slightest interest in your messy real-world data. Nor in co-authoring an article which will appear in a paywalled venue after four years and three revise-and-resubmits thanks to Reviewer #2. [6] Never, ever, ever forget this fact: if you want software written, train your own students—some, at least in political methodology programs, will be surprisingly good at the task [7]—or hire professionals (remotely) on short-term contracts.

Again, I have written 3,000 words on this topic but, for now, will consign it to the category of “therapy.”

3. These rants do not apply to the tiny number of elite programs—clearly MIT, Stanford, and Carnegie Mellon, plus a few more like USC, Cornell and, I’ve been pleased to discover, Virginia Tech, which are less conspicuous—which consistently attract students who are capable of learning, and at times even developing, advanced new methods and at those institutions may be able to experiment with fancier equipment than they could in the private sector, though this advantage is rapidly fading. Of course, the students at those top programs will have zero interest in working on social science projects: they are totally involved with one or more start-ups.

4. And just as in the profession of law, the incompetent ones presumably are gradually either weeded out, or self-select out: I can imagine no more miserable existence than trying to write code when you have no aptitude for the task, except if you are also surrounded, in a dysfunctional open-plan office setting [5], by people for whom the task is not only very easy, but often fun.

5. The references on this are coming too quickly now: just Google “open plan offices are terrible” to get the latest.

6. I will never forget the reaction of some computer scientists, sharing a shuttle to O’Hare with some political scientists, on learning of the publication delays in social science journals: it felt like we were out of the Paleolithic and trying to explain to some Edo Period swordsmiths that really, honest, we’re the smartest kids on the block, just look at the quality of these stone handaxes!

7. Given the well-documented systemic flaws in the current rigged system for recruiting programming talent—see this and this and this and this and this—your best opportunities are to recruit, train, and retain women, Blacks and Hispanics: just do the math. [8]

8. If you are a libertarian snowflake upset with this suggestion, it’s an exercise in pure self-interest: again, do the math. You should be happy.

9. I was originally going to call this the “Pólya trap” after George Pólya’s How to Solve Itonce required reading in many graduate programs but now largely forgotten—and Pólya does, in fact, suggest several versions of solving problems by converting them to something you already know how to solve, but his repertoire goes far beyond this.

10. They are also radically different: as I noted in SITPAED, in their event coding PETR-1, PETR-2, and UP are almost completely different programs with only their actor dictionaries in common.

11. Mind you, these sorts of disappointing outcomes are hardly unique to event data, or the social sciences—the National Ecological Observatory Network (NEON), a half-billion-dollar NSF-funded facility has spent the last five years careening from one management disaster to another like some out-of-control car on the black ice of Satan’s billiard table. Ironically, the generally unmanaged non-academic open source community—both pure open source and hybrid models—with projects like Linux and the vast ecosystem of Python and R libraries, has far more efficiently generated effective (that is, debugged, documented, and, through StackOverflow, reliably supported) software than the academic community, even with the latter’s extensive public funding.

12. Keep in mind the input to the eventual CAMEO dictionaries was developed at the University of Kansas over a period of more than 15 years, and focused primarily on the well-edited Reuters and later Agence France Presse coverage of just six countries (and a few sub-state actors) in the Middle East, with a couple subsets dealing with the Balkans and West Africa.

13. With a bit more work, one can use scrapping of major news sites and the fact that ICEWS, while not providing URLs, does provide the source of its coded events, and in most cases the article an event was coded from is quite unambiguous by looking at the actors involved (again, actor dictionaries are open and easy to update). Using this method, over time a substantial set of current article-event pairs could be accumulated. Just saying…

14. This, alas, is a very expensive empirical question since it would require a large set of human-curated test cases, ideally with the non-English cases coded by native speakers, to evaluate the two systems, even if one had a credibly-functioning system working in one or more of the non-English languages. Also, of course, even if the language-specific system worked better than MT on one language, that would not necessarily be true on others due to differences on either the event coder, the current state of MT for that language (again, this may differ dramatically between languages), or the types of events common to the region where the language is used (some events are easier to code, and/or the English dictionaries for coding them are better developed, than others). So unless you’ve got a lot of money—and some organizations with access to lots of non-English text and bureaucratic incentives to process these do indeed have a lot of money—I’d stay away from this one.

15. For example for a few years, when we had pretty good funding, the KEDS project at Kansas had its own subscription to Reuters. And when we didn’t, we were ably assisted by some friendly librarians who were generous with passwords.

The COPDAB data set, an earlier, if now largely forgotten, competitor to WEIS, claimed to be multi-source (in those days of coding from paper sources, just a couple dozen newspapers), but its event density relative to the single-sourced WEIS came nowhere close to supporting that contention, and the events themselves never indicated the sources: What probably happened is that multiple sourcing was attempted, but the human coders could not keep up and the approach was abandoned.

16. Keep in mind that precisely because these are international and in many instances, their reporters are anonymous, they have a greater capacity to provide useful information than do local sources which are subject to the whims/threats/media-ownership of local political elites and/or criminals. Usually overlapping sets.

17. Along with “PETRARCH,” let’s abandon that one, eh: I’m pretty good with acronyms—along with self-righteous indignation, it’s my secret superpower!—so just send me a general idea of what you are looking for and I’ll get back to you with a couple of suggestions. Seriously.

Back in the heady days of decolonization, there was some guy who liked to design flags—I think this was just a hobby, and probably a better hobby than writing event coders—who sent some suggestions to various new micro-states and was surprised to learn later that a couple of these flags had been adopted. This is the model I have in mind.

Or do it yourself—Scrabble™-oriented web sites are your best tool!

18. Militarized non-state actors, of course, will be missing and/or misidentified—”Irish Republican Army” might be misclassified as IRLMIL—though these tend to be less important prior to 1990. Managing the period of decolonization covered by the Cline data is also potentially quite problematic: I’ve not looked at the data so I’m not sure how well this has been handled. But it’s a start.

19. PITF, strictly speaking, doesn’t provide much information on how the IFM models have been used for policy purposes but—flip side of the rare events—there have been a few occasions where they’ve seemed be quite appreciative of the insights provided by the IFMs, and it didn’t take a whole lot of creativity to figure out what they must have been appreciative about.

That said, I think this issue of finding a few policy-relevant unexpected events is what has distinguished the generally successful PITF from the largely abandoned ICEWS: PITF (and its direct predecessor, the State Failures Project) had a global scope from the beginning and survived long enough—it’s now been around more than a quarter century—that the utility of its IFMs became evident. ICEWS had only three years (and barely that: this included development and deployment times) under DARPA funding, and focused on only 27 countries in Asia, some of these (China, North Korea) with difficult news environments and some (Fiji, Solomon Islands) of limited strategic interest. So compared to PITF, the simple likelihood that an unexpected but policy-relevant rare event would occur was quite low, and, as it happened, didn’t happen. So to speak.

20. In fact I think I may have picked up such an instance—the release may or may not have been accidental—at a recent workshop, though I’ll hold it back for now.

21. In a properly calibrated model, most of the predictions will be “obvious” to most experts: only the unexpected cases, and due to cognitive negativity bias, here largely the unexpected positive cases, will generate any interest. So one is left with a really, really small set of potential cases of interest.

22. In an internet cafe in some remote crossroads in Ghana, a group of disgruntled young men are saying “Damn, we’re busted! How’d he ever figure this out?”

Posted in Methodology, Programming | 1 Comment

Stuff I tell people about event data

Every few weeks—it’s a low-frequency event with a Poisson distribution, and thus exponentially distributed inter-arrival times—someone contacts me (typically from government, an NGO or a graduate student) who has discovered event data and wants to use it for some project. And I’ve gradually come to realize that there’s a now pretty standard set of pointers that I provide in terms of the “inside story” [1] unavailable in the published literature, which in political science tends to lag current practice by three to five years, and that’s essentially forever in the data science realm. While it would be rare for me to provide this entire list—seven items of course—all of these are potentially relevant if you are just getting into the field, so to save myself some typing in the future, here goes.

(Note, by the way, this is designed to be skimmed, not really read, and I expect to follow this list fairly soon with an updated entry—now available!—on seven priorities in event data research.)

1. Use ICEWS

Now that ICEWS  is available in near real time—updated daily, except when it isn’t— it’s really the only game in town and likely to remain so until the next generation of coding programs comes along (or, alas, its funding runs out).

ICEWS is not perfect:

  • the technology is about five years old now
  • the SERIF/ACCENT coding engine and verb/event dictionaries are proprietary (though they can be licensed for non-commercial use: I’ve been in touch with someone who has successfully done this)
  • the output is in a decidedly non-standard format, but see below
  • sources are not linked to traced back to specific URLs—arrgghhh, why not???
  • the coding scheme is CAMEO, never intended as a general ontology [2], and in a few places—largely to resolve ambiguities in the original—this is defined somewhat differently than the original University of Kansas CAMEO
  • the original DARPA ICEWS project was focused on Asia, and there is definitely still an Asia-centric bias to the news sources
  • due to legal constraints on the funding sources—no, not some dark conspiracy: this restriction dates to the post-Watergate 1970s!—it does not cover the US

But ICEWS has plenty of advantages as well:

  • it provides generally reliable daily updates
  • it has relatively consistent coverage across more than 20 years, though run frequency checks over time, as there as a couple quirks in there, particularly at the beginning of the series
  • it is archived in the universally-available and open-access Dataverse
  • it uses open (and occasionally updated) actor and sector/agent databases
  • there is reasonably decent (and openly accessible) documentation on how it works
  • it was written and refined by a professional programming team at BBN/Raytheon which had substantial resources over a number of years
  • it has excellent coverage across the major international news sources (though again, run some frequency checks: coverage is not completely consistent over time)
  • it has a tolerable false-positive rate

And more specifically, there is at least one large family of academic journals which now accepts event data research—presumably with the exception of studies comparing data sets—only if they are done using ICEWS: if you’ve done the analysis using anything else, you will be asked to re-do it with ICEWS. Save those scripts!

As for the non-standard data format: just use my text_to_CAMEO program to convert the output to something that looks like every other event data set.

The major downside to ICEWS is a lack of guaranteed long-term funding, which is problematic if you plan to rely on it for models intended to be used in the indefinite future. More generally, I don’t think there are plans for further development, beyond periodically updating the actor dictionaries: the BBN/Raytheon team which developed the coder left for greener pastures [3] and while Lockheed (the original ICEWS contractor) is updating the data, as far as I know they aren’t doing anything with the coder. For the present it seems that the ICEWS coder (and CAMEO ontology) are “good enough for government work” and it just is what it is. Which isn’t bad, just that it could be better with newer technology.

2. Don’t use one-a-day filtering

Yes, it seemed like a good idea at the time, around 1995, but it amplifies coding errors (which is to say, false positives): see the discussion in http://eventdata.parusanalytics.com/papers.dir/Schrodt.TAD-NYU.EventData.pdf (pp. 5-7). We need some sort of duplicate filtering, almost certainly based on clustering the original articles at the text level (which, alas, requires access to the texts, so it can’t be done as a post-coding step with the data alone), but the simple one-a-day approach is not it. Note that ICEWS does not use one-a-day filtering.

3. Don’t use the “Goldstein” scale

Which for starters, isn’t the Goldstein scale, which Joshua Goldstein developed in a very ad hoc manner back in the late 1980s [https://www.jstor.org/stable/174480: paywalled of course, this one at $40] for the World Events Interaction Survey (WEIS) ontology. The scale which is now called “Goldstein” is for the CAMEO ontology, and was an equally ad hoc effort initiated around 2002 by a University of Kansas graduate student named Uwe Reising for an M.A. thesis while CAMEO was still under development, primarily by Deborah Gerner and Ömür Yilmaz, and then brought into final form by me, maybe 2005 or so, after CAMEO had been finalized. But it rests entirely on ad hoc decisions: there’s nothing systematic about the development. [4]

The hypothetical argument that people make against using these scales—the WEIS- and CAMEO-based scales are pretty much comparable—is that positive (cooperative) and negative (conflictual) events in a dyad could cancel each other out, and one would see values near zero both in dyads where nothing was happening and in dyads where lots was happening. In fact, that perfectly balanced situation almost never occurs: instead  any violent—that is, material—conflict dominates the scaled time series, and completely lost is any cross-dyad or cross-time variation in verbal behavior—for example negotiations or threats—whether cooperative or conflictual.

The solution, which I think is common in most projects now, is to use “quad counts”: the counts of the events in the categories material-cooperation, verbal-cooperation, verbal-conflict and material-conflict.

4. The PETRARCH-2 coder is only a prototype

The PETRARCH-2 coder (PETR-2) was developed in the summer of 2015 by Clayton Norris, at the time an undergraduate (University of Chicago majoring in linguistics and computer science) intern at Caerus Analytics. [14] It took some of the framework of the PETRARCH-1 (PETR-1) coder, which John Beieler and I had written a year earlier—for example the use of a constituency parse generated by the Stanford CoreNLP system, and the input format and actor dictionaries are identical—but the event coding engine is completely new, and its verb-phrase dictionaries  are a radical simplification of the PETR-1 dictionaries, which were just the older TABARI dictionaries. The theoretical approach underlying the coder and the use of the constituency parse are far more sophisticated than those of the earlier program, and it contains prototypes for some pattern-based extensions such as verb transformations.  I did some additional work on the program a year later which made PETR-2 sufficiently robust as to be able to code a corpus of about twenty-million records without crashing. Even a record consisting of nothing but exam scores for a school somewhere in India.

So far, so good but…PETR-2 is only a prototype, a summer project, not a fully completed coding system! As I understand it, the original hope at Caerus had been to secure funding to get PETR-2 fully operational, on par with the SERIF/ACCENT coder used in ICEWS, but this never happened. So the project was left in limbo on at least the following dimensions

  • While a verb pattern transformation facility exists in PETR-2, it is only partially implemented for a single verb, ABANDON
  • If you get into the code, there are several dead-ends where Norris clearly had intended to do more work but ran out of time
  • There is no systematic test suite, just about seventy more or less random validation cases and a few Python unit-tests [5]
  • The new verb dictionaries and an internal transformation language called pico effectively defines yet-another dialect of CAMEO
  • The radically simplified verb dictionaries have not been subjected to any systematic validation and, for example, there was a bug in dictionaries—I’ve now corrected this on GitHub—which over-coded the CAMEO 03 category
  • The actor dictionaries are still essentially those of TABARI at the end of the ICEWS research phase, ca. 2011

This is not to criticize Norris’s original efforts—it was a summer project by an undergraduate for godsakes!—but the program has not had the long-term vetting that several other programs such as TABARI (and its Java descendent, JABARI [6]) and SERIF/ACCENT have had. [7]

Despite these issues, PETR-2 has been used to produce three major data sets—Cline Phoenix , TERRIER and UT/Dallas Phoenix. All of these could, at least in theory, be recoded at some point since all of these are based on legal copies of the relevant texts [8]

5. But all of these coders generate the same signal: The world according to CAMEO looks pretty much the same using any automated event coder and any global news source

Repeating a point I made in an earlier entry [https://asecondmouse.wordpress.com/2017/02/20/seven-conjectures-on-the-state-of-event-data/] which I simply repeat here with minimal updating as little has changed:

The graph below shows frequencies across the major (two-digit) categories of CAMEO using three different coders, PETRARCH 1 and 2 , and Raytheon/BBN’s ACCENT (from the ICEWS data available on Dataverse) for the year 2014. This also reflects two different news sources: the two PETRARCH cases are LexisNexis; ICEWS/ACCENT is Factiva, though of course there’s a lot of overlap between those.

Basically, “CAMEO-World” looks pretty much the same whichever coder and news source you use: the between-coder variances are completely swamped by the between-category variances. What large differences we do see are probably due to changes in definitions: for example PETR-2 over-coded “express intent to cooperate” (CAMEO 03) due to the aforementioned bug in the verb dictionaries; I’m guessing BBN/ACCENT did a bunch of focused development on IEDs and/or suicide bombings so has a very large spike in “Assault” (18) and they seem to have pretty much defined away the admittedly rather amorphous “Engage in material cooperation” (06).

I think this convergence is due to a combination of three factors:

  1. News source interest, particularly the tendency of news agencies (which all of the event data projects are now getting largely unfiltered) to always produce something, so if the only thing going on in some country on a given day is a sister-city cultural exchange, that will be reported  (hence the preponderance of events in the low categories). Also the age-old “when it bleeds, it leads” accounts for the spike on reports of violence (CAMEO categories 17, 18,19).
  2. In terms of the less frequent categories, the diversity of sources the event data community is using now—as opposed to the 1990s, when the only stories the KEDS and IDEA/PANDA projects coded were from Reuters, which is tightly edited—means that as you try to get more precise language models using parsing (ACCENT and PETR-2), you start missing stories that are written in non-standard English that would be caught by looser systems (PETR-1 and TABARI). Or at least this is true proportionally: on a case-by-case basis, ACCENT could well be getting a lot more stories than PETR-2 (alas, without access to the corpus they are coding, I don’t know) but for whatever reason, once you look at proportions, nothing really changes except where there is a really concentrated effort (e.g. category 18), or changes in definitions (ACCENT on category 06; PETR-2 unintentionally on category 03).
  3. I’m guessing (again, we’d need the ICEWS corpus to check, and that is unavailable due to the usual IP constraints) all of the systems have similar performance in not coding sports stories, wedding announcements, recipes, etc:  I know PETR-1 and PETR-2 have about a 95% agreement on whether a story contains an event, but a much lower agreement on exactly what the event is: again, their verb dictionaries are quite different. The various coding systems probably also have a fairly high agreement at least on the nation-state level of which actors are involved.

6. Quantity is not quality

Which is to say, event data coding is not a task where throwing gigabytes of digital offal at the problem is going to improve results, and we are almost certainly reaching a point where some of the inputs to the models have been deliberately and significantly manipulated. This also compounds the danger of focusing on where the data is most available, which tends to be areas where conflict has occurred in the past and state controls are weak. High levels of false positives are bad and contrary to commonly-held rosy scenarios, duplicate stories aren’t a reflection of importance but rather of convenience, urban, and other biases. But you need the texts to reliably eliminate duplicates.

The so-called web “inversion”—the point where more information on the web is fake than real, which we are either approaching or have already passed—probably marks the end of efforts to develop trigger models—the search for anticipatory needles-in-a-haystack in big data—in contemporary data. That said, a vast collection of texts from prior to the widespread manipulation of electronic news feeds exists (both in the large data aggregators—LexisNexis, Factiva, and ProQuest—and with the source texts held, under unavoidable IP restrictions, by ICEWS, Cline, the University of Oklahoma TERRIER project and presumably the EU JRC) and these are likely to be extremely valuable resources for developing filters which can distinguish real from fake news.

Due to the inversion, particularly when dealing with politically sensitive topics (or rather, topics that are considered sensitive by some group with reasonably good computer skills and an internet connection), social media are probably now a waste of time in terms of analyzing real-world events (they are still, obviously, useful in analyzing how events appear on social media), and likely will provide a systematically distorted signal.

7. There is an open source software singularity (but not the other singularity…)

Because I don’t live in Silicon Valley, some of the stuff coming out of there by the techno-utopians —Ray Kurzweil is the worst, with Peter Thiel (who has fled the Valley) and Elon Musk close seconds, and Thomas Friedman certainly an honorary East Coast participant—seems utterly delusional. Which, in fact, it is, but in my work as a programmer/data scientist I’ve begun to understand where at least some of this is coming from, and that is what I’ve come to call the “software singularity.” This being the fact that code—usually in multiple ever-improving variants—for doing almost anything you want is now available for free and has an effective support community on Stack Overflow: things that once took months now can be done in hours.

Some examples relevant to event data:

  • the newspaper3k library downloads, formats and updates news scrapping in 20 lines of Python
  • requests-HTML can handle downloads even when the content is generated by javascript code
  • universal dependency parses provide about 90% of the information required for event coding [9]
  • easily deployed data visualization dashboards are now too numerous to track [10]

And this is a tiny fraction of the relevant software: for example the vast analytical capabilities of the Python and R statistical and machine learning libraries would have, twenty years ago, cost tens if not hundreds of thousands of dollars (but the comparison is meaningless: the capabilities in these libraries simply didn’t exist at any price) and required hundreds of pounds—or if you prefer, linear-feet—of documentation.

To take newspaper3k as an illustrative example, the task of downloading news articles, even from a dedicated site such as Reuters, Factiva, or LexisNexis (and these are the relatively easy cases) requires hundreds of lines of code—and I spent countless hours over three decades writing and modifying such code variously in Pascal, Simula [11], C, Java, perl, and finally Python—to handle the web pipeline, filtering relevant articles, getting rid of formatting, and extracting relevant fields like the date, headline, and text. With newspaper3k , the task looks pretty much [READ THIS FOOTNOTE!!!] like this:

import newspaper

reut_filter = ["/photo/", "/video", "/health/", "/www.reuters.tv/",
"/jp.reuters.com/",...,  "/es.reuters.com/"] # exclude these

a_paper = newspaper.build("https://www.reuters.com/")
for article in a_paper.articles:
    if "/english/" not in article.url: # section rather than article
        continue
    for li in reut_filter:
        if li in article.url: break
    else
        article.download()
        article.parse()
        with open("reuters_" + article.url + ".txt") as fout:
            fout.write("URL: " + article.url + "\n")
            fout.write("Date: " + str(article.publish_date) + "\n")
            fout.write("Title: " + article.title + "\n")
            fout.write("Text:\n" + article.text + "\n")

An important corollary: The software singularity (and inexpensive web-based collaboration tools) enables development to be done very rapidly with small decentralized “remote” teams rather than the old model of large programming shops. In the software development community in Charlottesville, our CTO group [12] focuses on this as the single greatest current opportunity, and doing it correctly is the single greatest challenge, and I think Gen-Xers and Millennials in academia have also largely learned this: for research at least, the graduate “bull-pen” [13] is now global.

That other singularity?: no, sentient killer robots are not about to take over the world, and you’re going to die someday. Sorry.

A good note to end on.

Reference

Blog entries on event data in rough order of utility/popularity:

and the followup to this:

Footnotes

READ THIS FOOTNOTE!!!: I’ve pulled out the core code here from a working program which is about three times as long—for example it adjusts for the contingency that article.publish_date is sometimes missing—and this example code alone may or may not work. The full program is on GitHub: it definitely works and ran for days without crashing.

1. The working title for this entry was “S**t I tell people about event data.”

2. See the documentation for PLOVER —alas, still essentially another prototype—on problems with using CAMEO as a general coding framework.

3. Though I have heard this involved simply taking jobs with another company working out of the same anonymous Boston-area office park.

4. Around this same time, early 2000s, the VRA project undertook a very large web-based effort using a panel of experts to establish agreed-upon weights for their IDEA event coding ontology, but despite considerable effort they could never get these to converge. In the mid-1990s, I used a genetic algorithm to find optimal weights for a [admittedly somewhat quirky] clustering problem: again, no convergence, and wildly different sets of weights could produce more or less the same results.

5. TABARI, in contrast, has a validation suite—typically referred to as the “Lord of the Rings test suite” since most of the actor vocabulary is based on J.R.R. Tolkien’s masterwork, which didn’t stop a defense contractor from claiming “TABARI doesn’t work” after trying to code contemporary news articles based on a dictionary focused on hobbits, elves, orcs, and wizards—of about 250 records which systematically tests all features of the program as well as some difficult edge cases encountered in the past.

6. Lockheed’s JABARI, while initially just a Java version of TABARI—DARPA, then under the suzerainty of His Most Stable Genius Tony Tether, insisted that Lockheed’s original version duplicate not just the features of TABARI, but also a couple of bugs that were discovered in the conversion—was significantly extended by Lockheed’s ICEWS team, and was in fact an excellent coding program but was abandoned thanks to the usual duplicitous skullduggery that has plagued US defense procurement for decades: when elephants fight, mice get trampled. After witnessing a particularly egregious episode of this, I was in our research center at Kansas and darkly muttered to no one in particular “This is why you should make sure your kids learn Chinese.” To which a newly hired secretary perked up with “Of course my kids are learning Chinese!”

7. I will deal with the issue of UniversalPETRARCH—another partially-finished prototype—in the next entry. But in the meanwhile, note that the event coding engines of these three “PETRARCH” programs are completely distinct; the main thing they share in common is their actor dictionaries.

8. See in particular the Cline Center’s relatively recent “Global News Archive“:  70M unduplicated stories, 100M original, updated daily. The Cline Center has some new research in progress comparing several event data sets: a draft was presented at APSA-18 and a final version is near completion: you can contact them. Also there was a useful article comparing event data sets in Science about two years ago:  http://science.sciencemag.org/content/353/6307/1502

9. 90% in the sense that in my experiments so far, specifically with the proof-of-concept mudflat coder, code sufficient for most of the functionality required for event coding is about 10% the length of comparable code processing a constituency parse or a just doing an internal sparse parse. Since mudflat is just a prototype and edge cases consume lots of code, 90% reduction is probably overly generous, but still, UD parses are pretty close to providing all of the information you need for event coding.

10. Curiously, despite the proliferation of free visualization software, the US projects ICEWS, PITF and UT/D RIDIR never developed public-facing dashboards, compared to the extensive dashboards available at European-based sites such as ACLED, ViEWS, UCDP and EMM NewsBrief.

11. A short-lived simulation language developed at the University of Oslo in the 1960s that is considered the first object-oriented language and had a version which ran on early Macintosh computers that happened to have some good networking routines (the alternative at the time being BASIC). At least I think that’s why I was using it.

12. I’ve been designated an honorary CTO in this group because I’ve managed large projects in the past. And blog about software development. Most of the participants are genuine CTOs managing technology for companies doing millions of dollars of business per year, and were born long after the Beatles broke up.

13. I think this term is general: it refers to large rooms, typically in buildings decades past their intended lifetime dripping with rainwater, asbestos, and mold where graduate students are allocated a desk or table typically used, prior to its acquisition by the university sometime during the Truman administration, for plotting bombing raids against Japan. Resemblance to contemporary and considerably more expensive co-working spaces is anything but coincidental.

14. Norris was selected for this job by an exhaustive international search process consisting of someone in Texas who had once babysat for the lad asking the CEO of Caerus in the Greater Tyson’s Corner Metropolitan Area whether she by chance knew of any summer internship opportunities suitable for someone with his background. 

Posted in Methodology | 1 Comment

Instability Forecasting Models: Seven Ethical Considerations

So, welcome, y’all, to the latest bloggy edition on an issue probably relevant to, at best, a couple hundred people, though once again it has been pointed out to me that it is likely to be read by quite a few of them. And in particular, if you are some hapless functionary who has been directed to read this, a few pointers

  • “Seven” is just a meme in this blog
  • Yes, it is too long: revenge of the nerds. Or something. More generally, for the length you can blame some of your [so-called] colleagues to whom I promised I’d write it
  • You can probably skip most of the footnotes. Which aren’t, in fact, really footnotes so much as another meme in the blog. Some of them are funny. Or at least that was the original intention
  • You can skip Appendix 1, but might want to skim Appendix 2
  • ICEWS = DARPA Integrated Conflict Early Warning System; PITF = U.S. multi-agency Political Instability Task Force; ACLED = Armed Conflict Location and Event Data; PRIO = Peace Research Institute Oslo; UCDP = Uppsala [University] Conflict Data Program; DARPA = U.S. Defense Advanced Research Projects Agency; EU JRC = European Commission Joint Research Centre
  • Yes, I’m being deliberately vague in a number of places: Chatham House rules at most of the workshops and besides, if you are part of this community you can fill in the gaps and if you aren’t, well, maybe you shouldn’t have the information [1]

Violating the Bloggers Creed of absolute self-righteous certainty about absolutely everything, I admit that I’m writing this in part because some of the conclusions end up at quite a different place than I would have expected. And there’s some inconsistency: I’m still working this through.

Prerequisites out of the way, we shall proceed.

Our topic is instability forecasting models—IFMs—which are data-based quantitative models, originally statistical, now generally using machine learning methods, which forecast the probabilities of various forms of political instability such as war, civil war, mass protests, even coups, at present typically (though not exclusively) at the level of the nation-state and with a time horizon of about two years.  The international community developing these models has, in a sense, become the dog that caught the car: We’ve gone from “forecasting political instability is impossible: you are wasting your time” to “everyone has one of these models” in about, well, seven years.

As I’ve indicated in Appendix 1—mercifully removed from the main text so that you can skip it—various communities have been at this for a long time, certainly around half a century, but things have changed—a lot—in a relatively short period of time. So for purposes of discussion, let’s start by stipulating three things:

  1. Political forecasting per se is nothing new: any policy which requires a substantial lead time to implement (or, equivalently, which is designed to affect the state of a political system into the future, sometimes, as in the Marshall Plan or creation of NATO and later the EU, very far into the future) requires some form of forecasting: the technical term (okay, one technical term…) is “feedforward.”  The distinction is we now can do this using systematic, data-driven methods.[2]
  2. The difference between now and a decade ago is that these models work and they are being seriously implemented, with major investments, into policy making in both governments and IGOs. They are quite consistently about 80% accurate,[3] against the 50% to 60% accuracy of most human forecasters (aside from a very small number of “superforecasterswho achieve machine-level accuracy). This is for models using public data, but I’ve seen little evidence that private data substantially changes accuracy, at least at the current levels of aggregation (it is possible that it might at finer levels in both geographical and temporal resolution) [4]. The technology is now mature: in recent workshops I’ve attended, both the technical presentations and the policy presentations were more or less interchangeable. We know how to do these things, we’ve got the data, and there is an active process of integrating them into the policy flow: the buzzphrase is “early warning and early action” (EWEA), and  the World Bank estimates that even if most interventions fail to prevent conflict, the successes have such a huge payoff that the effort is well worthwhile even from an economic, to say nothing a humanitarian, perspective.
  3. In contrast to weather forecasting models—in many ways a good analogy for the development of IFMs—weather doesn’t respond to the forecast, whereas political actors might: We have finally hit a point where we need to worry about “reflexive” prediction. Of course, election forecasting has also achieved this status, and consequently is banned in the days or weeks before elections in many democracies.  Economic forecasting long ago also passed this point and there is even a widely accepted macroeconomic theory, rational expectations, dealing with it. But potential reflexive effects are quite recent for IFMs.

As of about ten years ago, the position I was taking on IFMs—which is to say, before we had figured out how to create these reliably, though I still take this position with respect to the data going to these—was that our ideal end-point would be something similar to the situation with weather and climate models [5]: an international epistemic community would develop a series of open models that could be used by various stakeholders—governments, IGOs and NGOs—to monitor evolving cases of instability across the planet, and in some instances these alerts would enable early responses to alleviate the conflict—EWEA—or failing that, provide, along the lines of the famine forecasting models, sufficient response to alleviate some of the consequences, notably refugee movements and various other potential conflict spill-over effects. As late as the mid-2000s, that was the model I was advocating.

Today?—I’m far less convinced we should follow this route, for a complex set of reasons both pragmatic and ethical which I still have not fully resolved and reconciled in my own mind, but—progress of sorts—I think I can at least articulate the key dimensions. 

1. Government and IGO models are necessarily going to remain secret, for reasons both bureaucratic and practical.

Start with the practical: in the multiple venues I’ve attended over the past couple of years, which is to say during the period when IFMs have gone from “impossible” to “we’re thinking about it” to “here’s our model”, everyone in official positions has been adamant that their operational models are not going to become public. The question is then whether those outside these organizations, particularly as these models are heavily dependent on NGO and academic data sets, should accept this or push back.

To the degree that this tendency is simply traditional bureaucratic siloing and information hoarding—and there are certainly elements of both going on—the natural instinct would be to push back. However, I’ve come to accept the argument that there could be some legitimate reasons to keep this information confidential due to the fact that the decisions of governments and IGOs, which can potentially wield resources on the order of billions of dollars, can have substantial reflexive consequences on decisions that could affect the instability itself, in particular

  • foreign direct investment and costs of insurance
  • knowledge that a conflict is or is not “on the radar” for possible early action
  • support for NGO preparations and commitments
  • prospects for collective action, discussed below

2. From an academic and NGO perspective, there is a very substantial moral issue in forecasting the outcome of a collective action event.

This is the single most difficult issue in this essay: are there topics, specifically those dealing with collective action, which should be off-limits, at least in the public domain, even for the relatively resource-poor academic and NGO research communities?

The basic issue here is that—at least with the current state of the technology—even if governments and IGOs keep their exact models confidential, the past ten years or so have shown that one can probably fairly easily reverse engineer these except for the private information: at least at this point in time, anyone trying to solve this problem is going to wind up with a model with relatively clear set of methods, data and outcomes, easily duplicated with openly available software and data.[6][7]

So in our ideal world—the hurricane forecasting world—the models are public, and when they converge, the proverbial red lights flash everywhere, and the myriad components of the international system gear up to deal with the impending crisis, and when it happens the early response is far more effective than waiting until the proverbial truck is already halfway over the cliff. And all done by NGOs and academic researchers, without the biases of governments.

Cool. But what if, instead, those predictions contribute to the crisis, and in the worst case scenario, cause a crisis that otherwise would not have occurred. For example through individuals reading predictions of impending regime transition, using that information to mobilize collective action, which then fails: we’re only at 80% to 85% accuracy as it is, and this is before taking into account possible feedback effects. [8] Hundreds killed, thousands imprisoned, tens of thousands displaced. Uh, bummer.

One can argue, of course, that this is no different that what is already happening with qualitative assessments: immediately coming to mind is the Western encouragement of the Hungarian revolt in 1956, the US-supported Bay of Pigs invasion, North Vietnam’s support of the Tet Offensive, which destroyed the indigenous South Vietnamese communist forces,[9] and US ambiguity with respect to the Shi’a uprisings following the 1991 Iraq War. And this is only a tiny fraction of such disasters.

But they were all, nonetheless, disasters with huge human costs, and actions which affect collective resistance bring to mind J.R.R. Tolkien’s admonition: “Do not meddle in the affairs of wizards, for they are subtle and quick to anger.” Is this the sort of thing the NGO and academic research community, however well meaning, should risk?

3. Transparency is nonetheless very important in order to assess limitations and biases of models.

Which is what makes the first two issues so problematic: despite the convergence in the existing models, every model has biases [10] and while the existing IFMs have converged, there is no guarantee that this will continue to be the case as new models are developed which are temporally and/or spatially more specific, or which take on new problems, for example detailed refugee flow models. Furthermore, since the contributions of the academic and NGO communities were vital to moving through the “IFM winter”—see Appendix 1—continuing to have open, non-governmental efforts seems very important.

Two other thoughts related to this

  1. Is it possible that the IFM ecosystem has become too small because the models are so easy to create? I’m not terribly worried about this because I’ve seen, in multiple projects, very substantial efforts to explore the possibility that other models exist, and they just don’t seem to be there, at least as for the sets of events currently of interest, but one should always be alert to the possibility of what appears to be a technological maturity is a failure of imagination.
  2. Current trends in commercial data science (as opposed to open source software and academic research) may not be all that useful for IFM development because this is not a “big data” problem: one of the curious things I noted at a recent workshop on IFMs is that deep learning was never mentioned. Though looking forward counterfactually, it is also possible that rare events—where one can envision even more commercial applications than those available in big data—are the next frontier in machine learning/artificial intelligence.

4. Quality is more important than quantity.

Which is to say, this is not a task where throwing gigabytes of digital offal at the problem is going to improve results, and we may be reaching a point where some of the inputs to the models have been deliberately and significantly manipulated because such manipulation is increasingly common. Also there is a danger in focusing on where the data is most available, which tends to be areas where conflict has occurred in the past and state controls are weak. High levels of false positives—notably in some atomic (that is, ICEWS-like) event data sets—are bad and contrary to commonly-held rosy scenarios, duplicate stories aren’t a reflection of importance but rather of convenience, urban and other biases.

The so-called web “inversion”—the point where more information on the web is fake than real, which we are either approaching or may have already passed—probably marks the end, alas,  of efforts to develop trigger models—the search for anticipatory needles-in-a-haystack in big data—in contemporary data, though it is worth noting that a vast collection of texts from prior to the widespread manipulation of electronic news feeds exists (both in the large news aggregators—LexisNexis, Factiva, and ProQuest—and with the source texts held, under unavoidable IP restrictions, by ICEWS, the University of Illinois Cline Center, the University of Oklahoma TERRIER project and presumably the EU JRC) and these are likely to be extremely valuable resources for developing filters which can distinguish real from fake news. They could also be useful in determining whether, in the past, trigger models are real, rather than a cognitive illusion borne of hindsight—having spent a lot of time searching for these with few results, I’m highly skeptical, but it is an empirical question—but any application of these in the contemporary environment will require far more caution than would have been needed, say, a decade ago.[11]

5. Sustainability of data sources.

It has struck me at a number of recent workshops—and, amen, in my own decidedly checkered experience in trying to sustain near-real-time atomic event data sets—the degree to which event data—structural data being generally solidly funded as national economic and demographic statistics—used in IFM models depends on a large number of small projects without reliable long-term funding sources. There are exceptions—UCDP as far as I understand has long-term commitments from the Swedish government, both PRIO and ACLED have gradually accumulated relatively long-term funding through concerted individual efforts, and to date PITF has provided sustained funding for several data sets, notably Polity IV and less notably the monthly updates of the  Global Atrocities Data Set—but far too much data is coming from projects with relatively short-term funding, typically from the US National Science Foundation, where social science grants tend to be just two or three years, with no guarantee of renewal, and grants from foundations which tend to favor shiny new objects over slogging through stuff that just needs to be done to support a diffuse community.

The ethical problem here is the extent to which one can expect researchers to invest in models using data which may not be available in the future, and, conversely, whether the absence of such guarantees is leading the collective research community to spend too much effort in the proverbial search for the keys where the light is best. Despite several efforts over the years, political event data, whether the “atomic” events similar to ICEWS or the “episodic” events similar to ACLED, the Global Terrorism Database, and UCDP, have never attained the privileged status the U.S. NSF has accorded to the continuously-maintained American National Election Survey  and General Social Survey, and the user community may just be too small (or politically inept) to justify this. I keep thinking/hoping/imagining that increased automation in ever less expensive hardware environments will bring the cost of some of these projects down to the point where they could be sustained, for example, by a university research center with some form of stable institutional support, but thus far I’ve clearly underestimated the requirements.

Though hey, it’s mostly an issue of money: Mr. and Ms. Gates, Ms. Powell-Jobs, Mr. Buffet and friends, Mr. Soros, y’all looking for projects?

6. Nothing is missing or in error at random: incorrect predictions and missing values carry information.

This is another point where one could debate whether this involves ethics or just professional best-practice—again, don’t confine your search for answers to readily available methods where you can just download some software—but these decisions can have consequences.

The fact that information relevant to IFMs is not missing at random has been appreciated for some time, and this may be one of the reasons why machine learning methods—where “missing” is just another value—have fairly consistently out-performed statistical models. This does, however, suggest that statistical imputation—now much easier thanks to both software and hardware advances—may not be a very good idea and is potentially an important source of model bias.

There also seems to be an increasing appreciation that incorrect predictions, particularly false positives (that is, a country or region has been predicted to be unstable but is not) may carry important information, specifically about the resilience of local circumstances and institutions. And more generally, those off-diagonal cases—both the false positives and false negatives—are hugely important in the modeling effort and should be given far more attention than I’m typically seeing. [12]

A final observation: at what point are we going to get situations where the model is wrong because of policy interventions? [8, again] Or have we already? — that’s the gist of the EWEA approach. I am guessing that in most cases these situations will be evident from open news sources, though there may be exceptions where this is due to “quiet diplomacy”—or as likely, quiet allocation of economic resources—and will quite deliberately escape notice.

7. Remember, there are people at the end of all of these.

At a recent workshop, one of the best talks—sorry, Chatham House rules—ended with an impassioned appeal on this point from an individual from a region which, regrettably, has tended to be treated as just another set of data points in far too many studies. To reiterate: IFMs are predicting the behaviors of people, not weather.

I think these tendencies have been further exacerbated by what I’ve called “statutory bias” [10, again]  in both model and data development: the bureaucratic institutions responsible for the development of many of the most sophisticated and well-publicized models are prohibited by law from examining their own countries (or in the case of the EU, set of countries). And the differences can be stark: I recently saw a dashboard with a map of mass killings based on data collected by a European project which, unlike PITF and ICEWS data, included the US: the huge number of cases both in the US and attributable to US-affiliated operations made it almost unrecognizable compared to displays I was familiar with.

This goes further: suppose the massive increase in drug overdose deaths in the US, now at a level exceeding 70,000 per year, and as amply documented,  the direct result of a deliberate campaign by one of America’s wealthiest families, whose philanthropic monuments blot major cities across the land, suppose this had occurred in Nigeria, Tajikistan or Indonesia, might we at the very least be considering that phenomenon a candidate for a new form of state weakness and/or the ability of powerful drug interests to dominate the judicial and legislative process? But we haven’t.

On the very positive side, I think we’re seeing more balance emerging: I am particularly heartened to see that ECOWAS has been developing a very sophisticated IFM, at least at the level of North American and European efforts, and with its integration with local sources, perhaps superior. With the increasing global availability of the relevant tools, expertise, and, through the cloud, hardware, this will only increase, and while the likes of Google and Facebook have convinced themselves only whites and Asians can write software, [13] individuals in Africa and Latin America know better.

 

Whew…so where does this leave us? Between some rugged rocks and some uncomfortable hard places, to be sure, or there would have been no reason to write all of this in the first place. Pragmatics aside—well-entrenched and well-funded bureaucracies are going to set their own rules, irrespective of what academics, NGOs and bloggers are advocating—the possibility of developing models (or suites of models) which set off ill-advised collective action concerns me. But so does the possibility of policy guided by opaque models developed with flawed data and techniques, to say nothing of policies guided by “experts” whose actually forecasting prowess is at the level of dart-throwing chimps. And there’s the unresolved question of whether there something special about the forecasts of a quantitative model as distinct from those of an op-ed in the Washington Post or a letter or anonymous editorial in The Economist, again with demonstrably lower accuracy and yet part of the forecasting ecosystem for a century or more. Let the discussion continue.

I’ll close with a final personal reflection that didn’t seem to fit anywhere else: having been involved in these efforts for forty or so years, it is very poignant for me to see the USA now almost completely out of this game, despite the field having largely been developed in the US. It will presumably remain outside until the end of the Trump administration, and then depending on attitudes in the post-Trump era, rebuilding could be quite laborious given the competition with industry for individuals with the required skill sets though, alternatively, we could see a John Kennedyesque civic republican response by a younger generation committed to rebuilding democratic government and institutions on this side of the Atlantic. In the meantime, as with high speed rail, cashless payments, and universal health care, the field is in good hands in Europe. And for IFMs and cashless payments, Africa.

Footnotes

1. I went to college in a karst area containing numerous limestone caves presenting widely varying levels of technical difficulty. The locations of easy ones where you really had to make an effort—or more commonly, drink—to get yourself into trouble were widely known. The locations of the more difficult were kept confidential among a small group with the skills to explore them safely. Might we be headed in a similar direction in developing forecasting models?—you decide.

Someone about a year ago at one of these IFM workshops—there have been a bunch, to the point where many of the core developers know each other’s drink preferences—raised the issue that we don’t want forecasts to provide information to the “bad guys.” But where to draw the line on this, given that some of the bad guys can presumably reverse engineer the models from the literature, given the technical sophistication we’ve seen by such groups, e.g. in IEDs and the manipulation of social media. Suddenly the five-year publication lags (and paywalls?) in academic journals becomes a good thing?

2.  I finally realized the reason why we haven’t had serious research into how to integrate quantitative and qualitative forecasts—this is persistently raised as a problem by government and IGO researchers—is the academics and small research shops like mine have a really difficult time finding real experts (as opposed, say, to students or Mech Turkers) who have a genuine interest and knowledge of a topic, as distinct from just going through the motions and providing uninformed speculation. In such circumstances the value added by the qualitative information will be marginal, and consequently we’re not doing realistic tests of expert elucidation methods. So by necessity this problem—which is, in fact, quite important—is probably going to have to be solved in the government and IGO shops.

3. I’m using this term informally, as the appropriate metric for “accuracy” on these predictions, which involve rare events, is complicated. Existing IFMs can consistently achieve an AUC of 0.80 to 0.85, rarely going above (or below) that level, which is not quite the same as the conventional meaning of “accuracy” but close enough. There are substantial and increasingly sophisticated discussions within the IFM community on the issue of metrics: we’re well aware of the relevant issues.

4. One curious feature of IFMs may be that private data will become important at short time horizons but not at longer horizons. This contrasts to the typical forecasting problem where errors increase more or less exponentially as the time horizon increases. In current IFMs, structural indicators (mostly economic, though also historical), which are readily available in public sources, dominate in the long term, whereas event-based conditions may be more important in the short term. E.g. “trigger models”—if these are real, an open question—are probably not relevant in forecasting a large-scale event like Eastern Europe in 1989 or the Arab Spring, but could be very important in forecasting at a time horizon of a few weeks in a specific region.

5. Science had a nice article [http://science.sciencemag.org/content/363/6425/342] recently on these models: Despite the key difference of IFMs being potentially reflexive and the fact that  that one of our unexplored domains is the short term forecast, some of the approaches used in those models—emphasized in the excerpt below—could clearly be adapted to IFMs

Weather forecasts from leading numerical weather prediction centers such as the European Centre for Medium-Range Weather Forecasts (ECMWF) and National Oceanic and Atmospheric Administration’s (NOAA’s) National Centers for Environmental Prediction (NCEP) have also been improving rapidly: A modern 5-day forecast is as accurate as a 1-day forecast was in 1980, and useful forecasts now reach 9 to 10 days into the future (1). Predictions have improved for a wide range of hazardous weather conditions  [emphasis added], including hurricanes, blizzards, flash floods, hail, and tornadoes, with skill emerging in predictions of seasonal conditions.

Because data are unavoidably spatially incomplete and uncertain, the state of the atmosphere at any time cannot be known exactly, producing forecast uncertainties that grow into the future. This sensitivity to initial conditions can never be overcome completely. But, by running a model over time and continually adjusting it to maintain consistency with incoming data [emphasis added], the resulting physically consistent predictions greatly improve on simpler techniques. Such data assimilation, often done using four-dimensional variational minimization, ensemble Kalman filters, or hybridized techniques, has revolutionized forecasting.

Sensitivity to initial conditions limits long-term forecast skill: Details of weather cannot be predicted accurately, even in principle, much beyond 2 weeks. But weather forecasts are not yet strongly constrained by this limit, and the increase in forecast skill has shown no sign of ending. Sensitivity to initial conditions varies greatly in space and time [emphasis added], and an important but largely unsung advance in weather prediction is the growing ability to quantify the forecast uncertainty  [emphasis added] by using large ensembles of numerical forecasts that each start from slightly different but equally plausible initial states, together with perturbations in model physics.

6. I’m constantly confronted, of course, with the possibility that there are secret models feeding into the policy process that are totally different than those I’m seeing. But I’m skeptical, particularly since in some situations, I’m the only person in the room who has been witness to the process by which independent models have been developed, such being the reward, if that’s the word, for countless hours of my life frittered away in windowless conference rooms watching PowerPoint™ presentations. All I see is convergence, not just in the end result, but also in the development process.

Consequently if a trove of radically different—as distinct from incrementally different, however much their creators think they are novel—secret models exists, there is a vast and fantastically expensive conspiracy spanning multiple countries creating an elaborate illusion solely for my benefit, and frankly, I just don’t think I’m that important. I’m sure there are modeling efforts beyond what I’m seeing, but from the glimmers I see of them, they tend to be reinventing wheels and/or using methods that were tried and rejected years or even decades ago, and the expansiveness (and convergence) of known work makes it quite unlikely—granted, not impossible—that there is some fabulously useful set of private data and methodology out there. To the contrary, in general I see the reflections from the classified side as utterly hampered by inexperience, delusional expectations, and doofus managers and consultants who wouldn’t make it through the first semester of a graduate social science methodology course and who thus conclude that because something is impossible for them, it is impossible for anyone. Horse cavalry in the 20th century redux: generally not a path with a positive ending.

7. Providing, of course, one wants to: there may be specialized applications where no one has bothered to create public models even though this is technically possible.

8. One of the more frustrating things I have heard, for decades, is a smug observation that if IFMs become successful, the accuracy of our models will decline and consequently we modelers will be very sad. To which I say: bullshit! Almost everyone involved in IFM development is acutely aware of the humanitarian implications of the work, and many have extended field experience in areas experiencing stress due to political instability (which is not, in general, true of the folks making the criticisms, pallid Elois whose lives are spent in seminar rooms, not in the field). To a person, model developers would be ecstatic were the accuracy of their models to drop off because of successful interventions, and this is vastly more important to them than the possibility of Reviewer #2 recommending against publication in a paywalled journal (which, consequently, no one in a policy position will ever read) because the AUC hasn’t improved over past efforts.

9. Back in the days when people still talked of these things—the end of the Vietnam War now being almost as distant from today’s students than the end of World War I was from my generation—one would encounter a persistent urban legend in DoD operations research—ah, OR…now there’s a golden oldie…—circles that somewhere deep in the Pentagon was a secret computer model—by the vague details, presumably one of Jay Forrester’s systems dynamics efforts, just a set of difference equations, as the model was frequently attributed to MIT—that precisely predicted every aspect of the Vietnam War and had decision-makers only paid attention to this, we would have won. You know, like “won” in that we’d now be buying shrimp, t-shirts and cheap toys made in Vietnam and it would be a major tourist destination. I digress.

Anyway, I’m pretty sure that in reality dozens of such models were created during the Vietnam War period, and some of them were right some of the time, but, unlike the Elder Wand of the Harry Potter universe, no such omniscient Elder Model existed. This land of legends situation, I would also note, is completely different than where we are with contemporary IFMs: the models, data, methods, and empirical assessments are reasonably open, and there is a high degree of convergence in both the approaches and their effectiveness.

10. I’d identify five major sources of bias in existing event data: some of these affect structural data sets as well, but it is generally use to be aware of these.

  1. Statutory bias, also discussed under point 7: Due to its funding sources, ICEWS and PITF are prohibited by a post-Vietnam-era law from tracking the behavior of US citizens. Similarly, my understanding is that the EU IFM efforts are limited (either by law or bureaucratic caution) in covering disputes between EU members and internal instability within them. Anecdotally, some NGOs also have been known to back off some monitoring efforts in some regions in deference to funders.
  2. Policy bias: Far and away the most common application of event data in the US policy community has been crisis forecasting, so most of the effort has done into collecting data on violent (or potentially violent) political conflict. The EU’s JRC efforts are more general, and for example have foci on areas where the EU may need to provide disaster relief, but is still strongly focused on areas of concern to the EU.
  3. Urban bias: This is inherent in the source materials: for example during the Boko Haram violence in Nigeria, a market bombing in the capital Abuja generated about 400 stories; one in the regional capital of Maiduguri would typically generate ten or twenty, and one in the marginal areas near Lake Chad would generate one or two. Similarly, terrorist incidents in Western capitals such as Paris or London generate days of attention where events with far higher casualty rates in the Middle East or Africa typically are covered for just a day.
  4. Media fatigue: This is the tendency of news organizations to lose interest in on-going conflicts, covering them in detail when they are new but shifting attention even though the level of conflict continues.
  5. English-language bias: Most of the event data work to date—the EU JRC’s multi-language work being a major exception—has been done in English (and occasionally Spanish and Portuguese) and extending beyond this is one of the major opportunities provided by contemporary computationally-intensive methods, including machine translation, inter-language vector transformations, and the use of parallel corpora for rapid dictionary development; IARPA has a new project called BETTER focused on rapid (and low effort) cross-language information extraction which might also help alleviate this.

11. See for example https://publications.parliament.uk/pa/cm201719/cmselect/cmcumeds/1791/1791.pdf

12. Though this is changing, e.g. see Michael Colaresi https://twitter.com/colaresi/status/842291411298996224 on bi-separation plots, which, alas, links to yet-another-frigging paywalled article, but at least the sentiment is there.

13. See https://www.nytimes.com/2019/02/13/magazine/women-coding-computer-programming.html. Google and Facebook have 1% blacks and 3% Hispanics in their technical employees! Microsoft, to its credit, seems to be more enlightened.

Appendix 1: An extraordinarily brief history of how we got here

This will be mostly the ramblings of an old man dredging up fading memories, but it’s somewhat important,  in these heady days of the apparently sudden success of IFMs, to realize the efforts go way back.  In fact there’s a nice MA thesis to be done here, I suppose in some program in the history of science, on tracking back how the concept of IFMs came about. [A1]

Arguably the concept is firmly established by the time of Leibnitz [], who famously postulated a “mathematical philosophy” wherein

“[…] if controversies were to arise, there would be no more need of disputation between two philosophers than between two calculators. For it would suffice for them to take their pencils in their hands and to sit down at the abacus, and say to each other (and if they so wish also to a friend called to help): Let us calculate.”

I’m too lazy to thoroughly track things during the subsequent three centuries, but Newtonian determinism expressed through equations was in quite the vogue during much of the period—Laplace, famously—and by the 19th century data-based probabilistic inference would gradually develop, along with an ever increasing amount of demographic and economic data, and by the 1920s, we had a well-established, if logically inconsistent, science of frequentist statistical inference. The joint challenges of the Depression and planning requirements of World War II (and Keynesian economic management more generally) led to the incorporation of increasingly sophisticated economic models into policy making in the 1930s and 1940s, while on the political side, reliable public opinion polling was established after some famous missteps, and by the 1950s used for televised real-time election forecasting.

By the time I was in graduate school, Isaac Asimov’s Foundation Trilogy—an extended fictional work whose plot turns on the failures of a forecasting model—was quite in vogue, and on a more practical level, the political forecasting work of the founder of numerical meteorology, Lewis Fry Richardson—originally done in the 1930s and 1940s then popularized in the early 1970s by Anatol Rapoport and others, and by the establishment of the Journal of Conflict Resolution—who in 1939 self-published a monograph titled Generalized Foreign Politics where he convinced himself [A2] that the unstable conditions in his arms race models, expressed as differential equations, for the periods 1909-1913 and 1933-1938 successfully predicted the two world wars. Also at this point we saw various “systems dynamics” models, most [in]famously the Club of Rome’s fabulously inaccurate Limits to Growth model  published in 1972, which spawned about ten years of [also very poorly calibrated] similar efforts.

More critically, by the time I was in graduate school, DARPA was funding work on IFMs at a level that kept me employed as a computer programmer rather than teaching discussion sections for introductory international relations classes. These efforts would carry on well into the Reagan administration—at no less a level than the National Security Council, under Richard Beale’s leadership of a major event data effort—before finally being abandoned as impractical, particularly on the near-real-time data side,

In terms of the immediate precedents to contemporary IFMs, in the 1990s there were a series of efforts coming primarily coming out of IGOs and NGOs—specifically Kumar Rupesinghe at the NGO International Alert and the late Juergen Dedring within the United Nations (specifically its Office for Research and the Collection of Information)—as well as the late Ted Robert Gurr in the academic world, Vice President Al Gore and various people associated with the US Institute for Peace in the US government, and others far too numerous to mention (again, there’s a modestly interesting M.A. thesis here, and there is a very ample paper trail to support it) but again these went nowhere beyond spawning the U.S. State Failures Project, the direct predecessor of PITF, but the SFP’s excessively elaborate (expensive, and, ultimately, irreproducible) IFMs initially failed miserably due to a variety of technical flaws.

We then went into a “IFM Winter”—riffing on the “AI Winterof the late-1980s—in the 2000s where a large number of small projects with generally limited funding continued to work in a professional environment which calls to mind Douglas Adams’s classical opening to Hitchhiker’s Guide to the Galaxy

Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun. Orbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.

Yeah, that’s about right: during the 2000s IFM work was definitely amazingly primitive and far out in the academically unfashionable end of some uncharted backwaters. But this decade was, in fact, a period of gestation and experimentation, so that by 2010 we had seen, for example, the emergence of the ACLED project under Clionadh Raleigh, years of productive experimentation at PITF under the direction of Jay Ulfelder, the massive investment by DARPA in ICEWS [A3], substantial modeling and data collections effort at PRIO under the directorship of Nils-Petter Gleditsch and substantial expansion of the UCDP datasets. While models in the 1960s and 1970s were confined to a couple dozen variables—including some truly odd ducks, like levels of US hotel chain ownership in countries as a measure of US influence—PITF by 2010 had assembled a core data set containing more than 2500 variables. Even if it really only needed about a dozen of these to get a suite of models with reasonable performance.

All of which meant that the IFM efforts which had generally not been able to produce credible results in the 1990s became—at least for any group with a reasonable level of expertise—almost trivial to produce by the 2010s.[A5] Bringing us into the present.

Appendix footnotes

A1. A colleague recently reported that a journal editor, eviscerating an historical review article no less, required him (presumably because of issues of space, as we all are aware that with electronic publication, space is absolutely at a premium!) to remove all references to articles published prior to 2000. Because we are all aware that everything of importance—even sex, drugs, and rock-and-roll!—was introduced in the 21st century.

A2. I’m one of, I’m guessing, probably a couple dozen people who have actually gone through Richardson’s actual papers at Lancaster University (though these were eventually published, and I’d also defer to Oliver Ashford’s 1985 biography as the definitive treatment) and Richardson’s parameter estimates which lead to the result of instability are, by contemporary standards, a bit dubious and using more straightforward methods actually leads to a conclusion of stability rather than instability. But the thought was correct…

A3. Choucri and Robinson’s Forecasting in International Relations (1974) is a good review of these efforts in political science, which go back into the mid-1960s. As that volume has probably long been culled from most university libraries, Google brings up this APSR review by an obscure assistant professor at Northwestern but, demonstrating as ever the commitment of professional scientific organizations and elite university presses to the Baconian norm of universal access to scientific knowledge, reading it will cost you $25. You can also get a lot from an unpaywalled essay by Choucri still available at MIT. 

A4. The ICEWS program involved roughly the annual expenditures of the entire US NSF program in political science. Even if most of this went to either indirect costs or creating PowerPoint™ slides, with yellow type on a green background being among the favored motifs.

A5. As I have repeated on earlier occasions—and no, this is not an urban legend—at the ICEWS kick-off meeting, where the test data and the unbelievably difficult forecasting metrics, approved personally by no less than His Stable Genius Tony Tether, were first released, the social scientists went back to their hotel rooms and on their laptops had estimated models which beat the metrics before the staff of the defense contractors had finished their second round of drinks at happy hour. Much consternation followed, and the restrictions on allowable models and methods became ever more draconian as the program evolved. The IFM efforts of ICEWS—the original purpose of the program—never gained traction despite the success of nearly identical contemporaneous efforts at PITF—though ICEWS lives on, at least for now, as a platform for the production of very credible near-real-time atomic event data.

Appendix 2: Irreducible sources of error

This is included here for two reasons. First, the exposition of a systematic set of reasons as to why IFMs have an accuracy “speed limit”—apparently an out-of-sample AUC in the range of 0.80 to 0.85 at the two-year time horizon for nation-states—and if you try to get past this, in all likelihood you are just over-fitting the model. Second, it takes far too long to go through all of these reasons in a workshop presentation, but they are important.

  • Specification error: no model of a complex, open system can contain all of the relevant variables: “McChrystal’s hairball” is the now-classic exposition of this. 
  • Measurement error: with very few exceptions, variables will contain some measurement error. And this presupposing there is even agreement on what the “correct” measurement is in an ideal setting. 
  • Predictive accuracy is limited by measurement error: for example in the very simplified case of a bivariate regression model, if your measurement reliability is 80%, your accuracy can’t be more than 90%.  This biases parameter estimates as well as the predictions. 
  • Quasi-random structural error: Complex and chaotic deterministic systems behave as if they were random under at least some parameter combinations. Chaotic behavior can occur in equations as simple as x_{t+1} = ax_t^2 + bx_t 
  • Rational randomness such as that predicted by mixed strategies in zero-sum games. 
  • Arational randomness attributable to free-will: the rule-of-thumb from our rat-running colleagues: “A genetically standardized experimental animal, subjected to carefully controlled stimuli in a laboratory setting, will do whatever it damn pleases.” 
  • Effective policy response: as discussed at several point in the main text, in at least some instances organizations will have taken steps to head off a crisis that would have otherwise occurred, and as IFMs are increasingly incorporated into policy making, this is more likely to occur. It is also the entire point of the exercise. 
  • The effects of unpredictable natural phenomenon: for example, the 2004 Indian Ocean tsunami dramatically reduced violence in the long-running conflict in Aceh, and on numerous occasions in history important leaders have unexpectedly died (or, as influentially, not died and their effectiveness was gradually diminished).

Tetlock (2013) independently has an almost identical list of the irreducible sources of forecasting error.

Please note that while the 0.80 to 0.85 AUC speed limit has occurred relentlessly in existing IFMs, there is no theoretical reason for this number, and with finer geographical granularity and/or shorter time horizons, this could be smaller, larger, or less consistent across behaviors. For a nice discussion of the predictive speed limit issue in a different context, criminal recidivism, see Science 359:6373 19 Jan 2018, pg. 263; the original research is reported in Science Advances 10.1126/sciadv.aao5580 (2018)

 

Posted in Methodology, Politics | 3 Comments

Yeah, I blog…

A while back I realized I’d hit fifty blog posts, and particularly as recent entries have averaged—with some variance—about 4000 words, that’s heading towards 200,000 words, or two short paperbacks, or about the length of one of the later volumes of the Harry Potter opus, or 60%-70% of a volume of Song of Ice and Fire. So despite my general admonishment to publishers that I am where book projects go to die, maybe at this point I have something to say on the topic of blog writing.

That and I recently received an email—I’m suspicious that it comes from a bot, though I’m having trouble figuring out what the objectives of the bot might be (homework exercise?)—asking for advice on blogging. Oh, and this blog has received a total of 88,000 views, unquestionably vastly exceeding anything I’ve published in a paywalled journal. [1] And finally I’ve recently been reading/listening, for reasons that will almost certainly never see the light of day [2] on the process of writing: Bradbury (magical) [3], Forster (not aging well unless you are thoroughly versed in the popular literature of a century ago), James Hynes’s Great Courses series on writing fiction, as well as various “rules for writing” lists by successful authors.

So, in my own style, seven observations.

1. Write, write, write

Yes, write, write, write: that’s the one of two consistent bits of advice every writer gives. [4] The best consistently write anywhere from 500 to 1500 words a day, which I’ve never managed (I’ve tried: it just doesn’t work for me) but you just have to keep writing. And if something doesn’t really flow, keep writing until it does (or drop it and try something else). And expect to throw away your first million words. [5]

But keep your day job: I’ve never made a dime off this, nor expect to: I suppose I’ve missed opportunities to earn some beer money by making some deal with Amazon for the occasional links to books, but doesn’t seem worth the trouble/conflicts of interest, and you’ve probably also noticed the blog isn’t littered with advertisements for tactical flashlights and amazing herbal weight-loss potions. [6] Far from making money, for all I know my public display of bad attitude has lost me some funding opportunities. Those which would have driven me (and some poor program manager) crazy.

2. Edit, edit, edit

Yes, in a blog you are freed from the tyranny of Reviewer #2, but with great power comes great responsibility, so edit ruthlessly. This has been easy for me, as Deborah Gerner and I did just that on the papers we wrote jointly for some twenty years, and at least some people noticed. [7] And as the saying goes, variously attributed to Justice Louis Brandeis and writer Robert Graves, “There’s no great writing, only great rewriting.”

In most cases these blog entries are assembled over a period of days from disjointed chunks—in only the rarest of cases will I start from the proverbial blank page/screen and write something from beginning to end—which gradually come together into what I eventually convince myself is a coherent whole, and then it’s edit, edit, edit. And meanwhile I’ll be writing down new candidate sentences, phrases, and snark on note cards as these occur to me in the shower or making coffee or walking or weeding: some of them work, some don’t. For some reason WordPress intimidates me—probably the automatic formating, I note as I’m doing final editing here—so now I start with a Google Doc—thus insuring an interesting selection of advertisements subsequently presented to me by the Google omniverse—and only transfer to WordPress in the last few steps. Typically I spend about 8 to 10 hours on an entry, and having carefully proofread it multiple times before hitting “Publish,” invariably find a half-dozen or so additional typos afterwards. I’ll usually continue to edit and add material for a couple days after “publication,” while the work is still in my head, then move on.

3. Be patient and experiment

And particularly at first: It took some time for me to find the voice where I was most comfortable, which is the 3000 – 5000 word long form—this one finally settled in at about 3100 words, the previous was 4100 words—rather than the 600-900 words typical of an essay or op-ed, to say nothing of the 140/280 characters of a Tweet. [8] My signature “Seven…” format works more often than not, though not always and I realized after a while it could be a straitjacket. [9]  Then there is the early commenter—I get very occasional comments, since by now people have figured out I’m not going to approve most and I’m not particularly interested in most feedback, a few people excepted [4]—who didn’t like how I handled footnotes, but I ignored this and it is now probably the most definitive aspect of my style.

4. Find a niche

I didn’t have a clear idea of where the blog would go when I started it six years ago beyond the subtitle “Reflections on social science, politics and education.” It’s ended up in that general vicinity, though “Reflections on political methodology, conflict forecasting and politics” is probably more accurate now. I’ve pulled back on the politics over the last year or so since the blogosphere is utterly awash in political rants these days, and the opportunities to provide anything original are limited: For example I recently started and then abandoned an entry on “The New Old Left” which reflected on segments of the Democratic Party returning to classical economic materialist agendas following a generation or more focused on identity but, like, well, duh… [10]  More generally, I’ve got probably half as much in draft that hasn’t gone in as that which has, and some topics start out promising and never complete themselves: you really have to listen to your subject. With a couple exceptions, it’s the technical material that generates the most interest, probably because no one else is saying it.

5. It usually involves a fair amount of effort. But occasionally it doesn’t.

The one entry that essentially wrote itself was the remembrance of Heather Heyer, who was murdered in the white-supremacist violence in Charlottesville on 12 August 2017. The commentary following Will Moore’s suicide was a close second, and in both of these cases I felt I was writing things that needed to be said for a community. “Feral…”, which after five years invariably still gets couple views a day [11], in contrast gestated over the better part of two years, and its followup, originally intended to be written after one year, waited for three.

Successful writers of fiction often speak of times where their characters—which is to say, their subconscious—take hold of a plot and drive it in unexpected but delightful ways. For the non-fiction writer, I think the equivalent is when you capture a short-term zeitgeist and suddenly find relevant material everywhere you look [18], as well as waking up and dashing off to your desk to sketch out some phrases before you forget them. [12]

6. Yeah, I’m repetitive and I’m technical

Repetitive: see Krugman, P., Friedman, T., Collins, G., Pournelle, J., and Hanh, T. N. Or, OMG, the Sutta Pikata. And yes, there is a not-so-secret 64-character catch-phrase that is in pretty much every single entry irrespective of topic.[13] As in music, I like to play with motifs, and when things are working well, it’s nice to resolve back to the opening chord.

Using the blog as technical outlet, notably on issues dealing with event data, has been quite useful, even if that wasn’t in the original plan. Event data, of course, is a comparatively tiny niche—at most a couple hundred people around the world watch it closely—but as I’ve recently been telling myself (and anyone else who will listen), the puzzle with event data is it never takes off but it also never goes away. And the speed with which the technology has changed over the past ten years in particular is monumentally unsuited to the standard outlets of paywalled journals with their dumbing-down during the review process and massive publication delays. [14] Two entries, “Seven observations on the [then] newly released ICEWS data” and “The legal status of event data” have essentially become canonical: I’ve seen them cited in formal research papers, and they fairly reliably get at least one or two views a week, and more as one approaches the APSA and ISA conferences or NSF proposal deadlines. [15]

7. The journey must be the reward

Again, I’ve never made a dime off this directly [16], nor do I ever expect to unless somehow enough things accumulate that they could be assembled into a book, and people buy it. [17] But it is an outlet that I enjoy and I also have become aware, from various comments over the years, that this has made my views known to people, particularly on the technical side in government, I wouldn’t ever have direct access to: They will mention they read my blog, and a couple times I believe they’ve deliberately done so in the earshot of people who probably wish they didn’t. But fundamentally, like [some] sharks have to keep moving to stay alive, and salmon are driven to return upstream, I gotta write—both of my parents were journalists, so maybe as with the salmon it’s genetic?—and you, dear reader, get the opportunity to read some of it.

Footnotes

1. But speaking of paywalled journals, the major European research funders are stomping down big-time!  No embargo period, no “hybrid models”, publish research funded by these folks in paywalled venues and you have to return your grant money. Though if health care is any model, this trend will make it across the Atlantic in a mere fifty to seventy-five years.

2. A heartfelt 897-page Updike-inspired novel centered on the angst of an aging computer programmer in a mid-Atlantic university town obsessed with declining funding opportunities and the unjust vicissitudes of old age, sickness, and death.

Uh, no.

African-Americans, long free in the mid-Atlantic colonies due to a successful slave revolt in 1711-1715 coordinated with native Americans—hey, how come every fictional re-working of U.S. history has to have the Confederacy winning the Civil War?—working as paid laborers on the ever-financially-struggling Monticello properties with its hapless politician-owner, now attacked by British forces seeking to reimpose Caribbean slavery (as well as being upset over the unpleasantness in Boston and Philadelphia). Plus some possible bits involving dragons, alternative dimensions most people experience only as dark energy, and of course Nordic—friendly and intelligent—trolls.

Or—totally different story—a Catalonian Jesuit herbalist—yeah, yeah, I’m ripping off Edith Pargeter (who started the relevant series at age 64!), but if there is the village mystery genre (Christie, Sayers (sort of…), Robinson) and the noir genre (Hammett, Chandler, Elroy), there’s the herbalist monk genre—working in the Santa Marie della Scala in the proud if politically defeated and marginalized Siena in the winter of 1575 who encounters a young and impulsive English earl of a literary bent who may or may not be seeking to negotiate the return of England to Catholicism, thus totally, like totally!!! changing the entire course of European history (oops, no, that’s Dan Brown’s schtick…besides, those sorts of machinations were going on constantly during that era. No dragons or trolls in this one.) but then a shot rings out on the Piazza del Campo, some strolling friars pull off their cloaks to reveal themselves as Swiss Guards, and a cardinal lies mortally wounded?

Nah…I’m the place where book projects go to die…

3. Ah, Ray Bradbury: Growing up in fly-over country before it was flown over, writing 1,000 words a day since the age of twelve, imitating various pulp genres until his own literary voice came in his early 20s. A friend persuades him to travel across the country by train to visit NYC where after numerous meetings with disinterested publishers, an editor notes that his Martian and circus short stories were, in fact, the grist for two publishable books—which I of course later devoured as a teenager—and he returns home to his wife and child in LA with checks covering a year’s food and rent. Then Bradbury, then only a high-school education, receives a note that Christopher Isherwood would like to talk with him, and then Isherwood says they really should talk to his friend Aldous Huxley. And by 1953, John Huston asks him to write a screenplay for Moby Dick, provided he do this while living in the gloom of Ireland.

4. And—beyond edit, edit, edit—about the only one. For example, Bradbury felt that a massive diet of movies in his youth fueled his imagination; Stephen King says if you can’t give up television, you’re not serious about writing. About half of successful writers apparently never show unfinished drafts to anyone, the other half absolutely depend on feedback from a few trusted readers, typically agents and/or partners.

Come to think of it, two other near-universal bits of advice: don’t listen to critics, and, closely related, don’t take writers’ workshops very seriously (even if you are being paid to teach in them).

5. Which I’d read first from Jerry Pournelle, but it seems to be general folklore: Karen Woodward has a nice gloss on this.

6. Or ads for amazing herbal potions for certain male body functions. I actually drafted a [serious] entry for “The Feral Diet” I’d followed with some success for a while but, alas, like all diet regimes, it only worked for weight loss for a while (weight maintenance has been fine): I ignore my details and just follow Michael Pollan and Gary Taubes

7. High point was when we were asked by an NSF program director if it would be okay to share one of our [needless to say, funded] proposals with people who wanted an example of what a good proposal looked like.

8. Twitter is weird, eh? I avoided Twitter for quite some time, then hopped—hey, bird motifs, right?—in for about a year and a half, then hopped out again, using it now only a couple times a week. What is interesting is the number of people who are quite effectively producing short-form essays using 10 to 20 linked tweets, which probably not coincidentally translates to the standard op-ed length of around 700 – 800 words, but the mechanism is awkward, and certainly wouldn’t work for a long-form presentation. If Twitter bites the dust due to an unsustainable financial model—please, please, please, if only for the elimination of one user’s tweets in particular—that might open a niche for that essay form, though said niche might already be WordPress.

While we’re on the topic of alternative media, I’ve got the technology to be doing YouTube—works for Jordan Peterson and, by inference, presumably appeals to lobsters—but I suspect that won’t last both because of the technological limitations—WordPress may not be stable but the underlying text—it’s UTF-8 HTML!—is stable—and the fact the video form itself is more conversational and hence more transient. Plus I rarely watch YouTube: I can read a lot faster than most people speak.

9. Same with restricting the length, which I tried for a while, and usually putting constraints around a form improves it. But editing for length is a lot of work, as any op-ed columnist will tell you, and this is an informal endeavor. The “beyond the snark” reference section I employed for a while also didn’t last—in-line links work fine, and the ability to use hyperlinks in a blog is wonderful, one of the defining characteristics of the medium.

10. I’ve got a “Beyond Democracy” file of 25,000 words and probably a couple hundred links reflecting on the emergence of a post-democratic plutocracy and how we might cope with it: several unfinished essays have been stashed in this file. Possibly that could someday jell as a book, but, alas, have I mentioned that I am the place where book projects go to die? Are you tired of this motif yet?

11. The other entry which is consistently on the “Viewed” list on the WordPress dashboard—mind you, I only look at this for the two or three days after I post something to get a sense of whether it is getting circulated—is “History’s seven dumbest self-inflicted political disasters.” Whose popularity—this is Schrodt doing his mad and disruptable William McNeill imitation (badly…)—I absolutely cannot figure out: someone is linking it somewhere? Or some bot is just messing with me?

12. Dreaming of a topic for [seemingly] half the night: I hate that. The only thing worse—of course, beyond the standard dreams of being chased through a dank urban or forested landscape by a menacing evil while your legs turn to molasses and you simply can’t run fast enough—is dreaming about programming problems. If your dreams have you obsessing with some bit of writing, get out of bed and write it down: it will usually go away, and usually in the morning your nocturnal insight won’t seem very useful. Except when it is. Same with code.

13. Not this one: that would make it too easy.

14. I recently reviewed a paper—okay, that was my next-to-last review, honest, and a revise-and-resubmit, and really, I’m getting out of the reviewing business, and Reviewer #2 is not me (!!)—which attempted to survey the state of the art in automated event coding, and I’d say got probably two-thirds of the major features wrong. But the unfortunate author had actually done a perfectly competent review of the published literature, the problem being that what’s been published on this topic is the tip of the proverbial iceberg in a rapidly changing field and has a massive lag time. This has long been a problem, but is clearly getting worse.

15. Two others are also fairly useful, if both a bit dated: “Seven conjectures on the state of event data” and [quite old as this field goes] “Seven guidelines for generating data using automated coding“. 

16. It’s funny how many people will question why one writes when there is no prospect of financial reward when I’ve never heard someone exclaim to a golfer: “What, you play golf for free?? And you even have to pay places to let you play golf? And spend hours and hours doing it? Why, that’s so stupid: Arnold Palmer, Jack Nicklaus, and Tiger Woods made millions playing golf! If you can’t, just stop trying!”

17. As distinct from Beyond Democracy, the fiction, and a still-to-jell work on contemporary Western Buddhism—like the world needs yet another book by a Boomer on Buddhism?—all of which are intended as books. Someday…maybe…but you know…

18. Like the Economist Espresso‘s quote of the day: “A person is a fool to become a writer. His [sic] only compensation is absolute freedom.” Roald Dahl (Charlie and the Chocolate Factory, Matilda, The Fantastic Mr. Fox). Yep.

Posted in Uncategorized | Leave a comment

Happy 60th Birthday, DARPA: you’re doomed

Today marks the mid-point of a massive self-congratulatory 60th anniversary celebration by DARPA [1]. So, DARPA, happy birthday! And many happy returns!! YEA!!!

That’s a joke, right? Why yes, how did you guess?

A 60th anniversary, of course, is very important landmark, but not in a good way: Chinese folklore says that neither fortune nor misfortune persist for more than three generations,[2] and the 14th century historian and political theorist Ibn Khaldun pegged three generations as the time it took a dynasty to go from triumph to decay. Calculating a human generation as 20 years and, gulp, that makes 60.

Vignette #1:

DARPA, perhaps aware of some of the issues I will be raising here, has embarked on some programs with “simplified” proposal processes (e.g. this https://www.darpa.mil/news-events/2018-07-20a). In DARPA-speak, “simplified” means a 20 to 30 page program description with at least 7 required file templates, the first being an obligatory PowerPoint™ slide. In industry-speak, this is referred to as “seven friggin’ PDF files and WTF a friggin’ required PowerPoint™ slide??—in 2018 who TF uses friggin’ PowerPoint™???” [3]

Vignette #2:

A few months back, I’d been alerted to an interesting DARPA DSO BAA under the aforementioned program, and concocted an approach involving another Charlottesville-based tech outfit (well, their CTO is in CVille: the company is 100% remote on technical work, across a number of countries) with access to vast amounts of relevant data. The CTO and I had lunch on a Friday—during which I learned the company had developed out of an earlier DARPA-funded project—and he was all ready to move ahead with this.

On Monday the project was dead, vetoed by their CFO: they have plenty of work to do already, and it is simply too expensive to work with DARPA as DARPA involves an entirely different set of contracting and collaboration norms than the rest of the industry. Sad.

Arlington, we have a problem.

But before we go any further, I already know what y’all are thinking: “Hey, Schrodt, so things have finally caught up with your obnoxious little feral strategy, eh? Left academia, no longer have access to an Office of Sponsored Research [5][6] so you can’t apply for DARPA funding any more. Nah, nah, nah! LOSER! LOSER!! LOSER!!!

Well, yeah, elements of that: per vignette #2, there are definitely DARPA [7] programs I’d like to be participating in, but no longer can, or rather, cannot assemble any conceivable rationale for attempting. Having sketched out this diatribe [8], I was on the verge of abandoning it as mere sour grapes when The Economist [1 September 2018] arrived with a cover story based on almost precisely the same complex social systems argument I’d already outlined for DARPA, albeit about Silicon Valley generally. So maybe I’m on to something. Thus we will continue.

As I was reminded at a recent workshop, DARPA was inspired by the scientific/engineering crisis of Sputnik. [9] DARPA’s challenge in the 21st century, however, is that it continues to presuppose the corporate laboratory structures of the Sputnik era, where business requirements and incentives were [almost] completely reversed from what they are today: the days of the technical supremacy of Bell Labs and Xerox PARC are gone, and they aren’t coming back. [10]

As The Economist points out in the context of the demise of Silicon Valley as an attractive geographical destination, Silicon Valley’s very technological advances—many originally funded by DARPA—have sown the seeds of its geographical destruction. DARPA faces bureaucratic rather than geographical challenges, but is essentially in the same situation at least in the world of artificial intelligence/machine learning/data science (AI/ML/DS) where DARPA appears to be desperately trying to play catch-up.

A few of the insurmountable social/economic changes DARPA is facing:

  • AI/ML/DS innovations can be implemented almost instantly with essentially no capital investment.[11] As The Economist [25 August 2018] notes, in 1975 only 17% of the value of the S&P 500 companies was in intangibles; by 2015 this was 84%.
  • The bifurcation/concentration of the economy, particularly in technical areas: the rate of start-ups has slowed, and those that exist quickly get snatched up by the monsters. Consider for example the evolution of the Borg-like SAIC/Leidos [12], which first gobbled up hundreds of once-independent defense consulting firms, then split, and now Leidos is getting purchased by Lockheed. You will be assimilated!
  • As some recent well-publicized instances have demonstrated, working with DARPA—or the defense/intelligence community more generally—will be actively opposed by some not insignificant fraction of the all-too-mobile employees of the technology behemoths. Good luck changing that.

As I’ve documented in quite an assortment of posts in this blog—I’ve been successfully walking this particular walk for more than five years now—these changes have led an an accelerating rise, particularly in the AI/ML/DS field, of the independent remote contractor—either an individual or a small self-managing team—due to at least five factors

  • Ubiquity of open source software which has zero monetary cost of entry and provides a standard platform across potential clients.
  • Cloud computing resources which can be purchased and cast aside in a microsecond with no more infrastructure than a credit card.
  • StackOverflow and GitHub putting the answers to almost any technical question a few keystrokes away: the relative advantage of having access to local and/or internal company expertise has diminished markedly.
  • A variety of web-based collaborative environments such as free audio and video conferencing, shared document environments, and collaboration-oriented communication platforms such as Slack, DropBox and the like.
  • Legitimation of the “gig economy” from both the demand and supply side: freelancers are vastly less expensive to hire and are now viewed as entrepreneurial trailblazers rather than as losers who can’t get real jobs. In fact, because of its autonomy, remote work is now considered highly desirable.

The upshot, as explosion.ai’s (spaCy, prodigy) Ines Montani explains in a recent EuroPython talk, small companies are now fully capable of doing what only massive companies could do a decade or so ago. Except, of course, dealing with seven friggin’ PDF files including a required friggin’ PowerPoint™ slide to even bid on a project with some indeterminate chance of being funded following a six to nine month delay. More shit sandwiches?: oh, so sorry, just pass the plate as I’ve already had my share.

As those who follow my blog are aware, I spend my days in a pleasant little office in a converted Victorian three blocks from the Charlottesville, Virginia pedestrian mall [13] in the foothills of the Blue Ridge, uninterrupted except by the occasional teleconference. I have nearly complete control of my tasks and my time, and as an introvert whose work requires a high level of concentration, this is heaven. My indirect costs are around 15%. In the five years I’ve supported myself in this fashion, my agreements with clients typically involve a few conversations, a one or two page SOW, and then we get to work.  

DARPA-compatible alternatives to this sort of remote work, of course, would involve transitioning to some open-office-plan hellhole beset with constant interruptions and “supervision” by clueless middle-managers who spend their days calling meetings and writing corporate mission statements because, well, that’s just what clueless middle managers are paid to do.[14] These work environments are horribly soul-sapping and inefficient—with indirect costs far exceeding mine—except for that rather sizable proportion of the employees who are in fact not adding any value to the enterprise but are enjoying an indefinitely extended adolescent experience where, with any luck at all, they can continue terrorizing the introverts who actually are writing quality code, just as they did in junior high school, which is pretty much what open-office-plan hellholes try to replicate. I digress.

So, I suppose, indeed I am irritated because there are opportunities out there I can’t even compete for without radically downgrading my situation, and even though I, and the contemporary independent contractor community more generally, could probably do these tasks at lower cost and higher quality than is being done by the corporate behemoths who will invarably end up with all that money, this despite the fact that a migration to remote teams with lower costs and higher output is precisely what we are seeing in the commercial sector. Says no less than The Economist.

Okay, okay, so the FAANG are leary about even talking to DARPA, and we’ve already established that the existing contractors aren’t giving DARPA what it is looking for [15], but you’ve still got academic computer science to fall back on, right? Right?

Uh, not really.

Once again, any reliance on academia has DARPA doing the time warp again and heading back to the glory days of the Sputnik crisis when, in fact, academic research was probably a pretty good deal. But now:

  • Tuition—which will be covered directly or indirectly—at all research universities has soared as the public funding readily available in the 1950s has collapsed.
  • Universities no longer have the newest and shiniest toys: those are in the private sector.
  • The best and brightest students zip through their graduate programs in record time, with FOMO private sector opportunities nipping at their tails. The ones who stick around…well, you decide.
  • The best and brightest professors have far more to gain from their startups and consultancies than from filling out seven friggin’ PDF files including one friggin’ required PowerPoint™ slide. Those with no such prospects, and the people building empires for the sake of empire building and/or aspirations to become deans, associate deans, assistant deans, deanlings or deanlets, yeah, you might get some of those. Congrats. Or something.

And these are impediments before we consider the highly dysfunctional publication incentives which have reduced academic computer science to only a single true challenge, the academic Turing Test—probably passed several years ago but the reality of this still hidden—for who will be the first to write a bot which can successfully generate unlimited publishable AI/ML/DS papers.[16] This and the fact that computer science graduate students tend to be like small birds, spending most of their time flitting around in pursuit of novelties in the form of software packages and frameworks with lifespans comparable to that of a dime-store goldfish. And all graduate students, on entering even the most lowly M.S.-level program, are sworn to a dark oath, enforced with the thoroughness of a Mafia pizza parlor protection racket, to never, ever, under any circumstances, comment, document or test their code. [21]

Academia will not save you. No one will save you.

HARUMPHHH!!! So if you bad-attitude remote contractors are so damn smart and so damn efficient, there’s an obvious market failure/arbitrage opportunity here which will self-correct because, as we all know, markets always work perfectly.

Well, maybe, but I’d suggest this is going to be tough, for at least three reasons.

The first issue, for the remote independent contractors as well as the FAANG, is simply “why bother?”: I’m not seeing a whole lot of press about AI/ML/DS unemployment, and if you can get work with a couple of phone calls and one-page SOW, why deal with seven friggin’ PDF forms and a friggin’ required PowerPoint™ slide?

Then there’s the unpleasant fact that anyone attempting to arbitrage the inefficiencies here is wandering into the arena with the likes of SAIC, Lockheed and BBN, massive established players more than happy to destroy you, and they consider seven friggin’ PDFs and all other barriers to entry a feature rather than a bug, as well as deploying legions of Armani-shod lobbyists to make damn sure things stay that way. But mostly, they’ll come after any threats to their lucrative niche faster than a rattlesnake chasing a pocket gopher. I suppose it could be done, but is not for the light of heart. Or bank account.

The final issue is that because DARPA [famously, and probably apocryphally] expects its projects to fail 80% of the time, there’s a frog-in-boiling-water aspect where DARPA won’t notice—until things are too late—structural problems which now cause projects to fail where they would have succeeded in the absence of those new conditions.  Well, until the Chinese get there first.[17]

There is, in the end, a [delightful?] irony here: one of the four foci within the DARPA Defense Sciences Office, those folks whose idea of “simple” is a 30 page BAA and seven friggin’ PDF files starting with a friggin’ obligatory PowerPoint™ slide, is called “complex social systems,” which in most definitions would include self-modifying systems.[18] And a second of those foci deals with “anticipating surprise.”

Well, buckeroos, you’ve got both of these phenomena going on right there in the sweet little River City of Arlington, VA: a complex self-modifying system that’s dropped a big surprise and in all likelihood there’s nothing you can do about.

Okay, maybe a tad too dramatic: at the most basic level, all that is going on here is a case of the well-understood phenomenon of disruptive innovation—please note my clever use of a link to that leftist-hippy-commy rag, the Harvard Business Reviewwhere new technologies enable the development of an alternative to the established/entrenched order which is typically in the initial stages not in fact “better” than the prevailing technology, but attains a foothold by being faster, cheaper and/or easier to use, thus appealing to those who don’t actually need “the best.”

Project proposals provided by remote independent contractors with 15% IDC will—assuming they even try—be inferior to those of the entrenched contractors with 60% IDC, since in addition to employing legions of Armani-shod lobbyists they also employ platoons of PowerPoint™ artistes, echelons of document editors, managers overseeing countless layers of internal reviews, and probably the occasional partridge in a pear tree.[19] You want a great proposal?: wow can these folks ever produce a great proposal!

They just can’t deliver on the final product [20] for the reasons noted above. Leaving us in this situation

What they propose

What they deliver

 

 

 

 

 


In contrast, the coin of the realm for the independent shops is their open code on GitHub: even if the contracted work will be proprietary, you’ve got to have code out where people can look at it, and that’s why contemporary companies are comfortable hiring people they’ve never met in person—and may never meet—and who will be working eight time zones away: it’s the difference between hiring someone to remodel your kitchen based on the number of glossy architectural magazines they bring to a meeting versus hiring them based on other kitchens they’ve remodeled. All of which is to say that in the contemporary AI/ML/DS environment, assembling an effective team is more Ocean’s 11 or Rogue One, much less The Office.

So on a marginally optimistic note, I’ll modify my title slightly: DARPA, until you find a structure that rewards people for writing solid code, not PowerPoint™ slides, you’re doomed.

Happy 60th.

Footnotes

1. If you don’t know what DARPA is, stop right here as the cultural nuances permeating the remainder of this diatribe will make absolutely no sense. I’m also obnoxiously refraining from defining acronyms such as DSO, BAA, SOW, PM, FAR, FOMO, FAANG, CFO, CTO, ACLED, ICEWS, F1, AUC, MAE, and IARPA because refraining from defining acronyms is like so totally intrinsic to this world.

2. This is apparently an actual Chinese proverb, though it is typically rendered as Wealth does not pass three generations” along with many variants on the same theme. There’s a nice exposition, including an appropriate reference to Ibn Khaldun, to be found, of all places, on this martial arts site.

3. A couple weeks ago the social media in CVille—ya gotta love this place—got into an extended tiff over whether the use of the word “fuck”—or more generally “FUCK!” or “FUCK!!” or “THAT’S TOTALLY FUCKING FUCKED, YOU FUCKING FUCKWIT!!!”—was or was not a form of cultural appropriation. Of course, it’s not entirely clear what “culture” is being appropriated, and thus offended, as the word has been in common use for centuries, but presumably something local as the latter phrase is pretty much representative of contemporary public discourse, such as it is, in our fair city.[4] Okay, not quite. Fuckwit.  So to avoid offense—not from the repetitive use of an obscenity, but the possibility of that indeterminate variant on cultural appropriation—I will continue to refer to the “seven friggin’ PDFs and one friggin’ required PowerPoint™ slide.”

4. Browsing the Politics and Prose bookstore in DC last weekend—this at the new DC Wharf, where the hoi polloi can gaze upon the yachts of the lobbyists for DARPA contractors—I noticed that if you would like to write a book, but really don’t have anything to say, adding FUCK to the title is a popular contemporary marketing approach. Unfortunately, these tomes—mind you, they are typically exceedingly short, so perhaps technically they are not really “tomes”—will probably all be pulped or repurposed as mulch, but should a few escape we can foresee archeologists of the future—probably sentient cockroaches—using telepathic powers to record in [cockroach-DARPA-funded] holographic crystals “Today our excavation reached the well-documented FUCK-titled-tome layer, reliably dated to 2015-2020 CE.” Though they will more likely have to be content with the “Keurig capsule layer”, which far less precisely locates accompanying artifacts only to 1995-2025 CE.

5. Or as Mike Ward eloquently puts it: OSP == Office for the Suppression of Research.

6. As Schrodt puts it, the OSP mascot is the tapeworm.

7. And IARPA: same set of issues, less money.

8. Though inspired in part after listening to some folks at the 2018 summer Society for Political Methodology meetings—unlike the three-quarters of political science Ph.D.s who will not find tenure track positions, political methodologists are eminently employable, albeit not necessarily in academia—literally laughing out loud—and this conference being in Provo, Utah, laughing out loud while stone-cold sober—about Dept of Defense attempts to recruit high-quality collaborators in the AI/ML/DS fields.

9. In this presentation, we were told “I’m sure no one here remembers Sputnik.” Dude, I not only remember Sputnik—vividly—I can even remember when the typical Republican thought the Russians were an insidious and unscrupulous enemy!

10. From the recent obituary of game theorist Martin Shubik

After earning his doctorate at Princeton, he worked as a consultant for General Electric and for IBM, whose thinking about research scientists he later described to The New York Times: “Well, these are like giant pandas in a zoo. You don’t really quite know what a giant panda is, but you sure as hell know (1) you paid a lot of money for it, and (2) other people want it; therefore it is valuable and therefore it’s got to be well fed.

11. Capital intensity is a key caveat here: as the price of the shiniest new toys increases, so does the competitiveness of DARPA compared to the commercial sector. So, for example, in areas such as quantum computing, nanotechnologies and most work on sensors, DARPA will do just fine. AI/ML/DS: not so much. So despite my dramatic title—hey, it’s a blog!—DARPA is probably not doomed in endeavors involving bashing metals or molecules. 

12. I wasn’t really sure how to find that Vanity Fair article on SAIC—which got quite the attention when it first came out more than a decade ago—but it popped right up when I entered the search term “SAIC is evil”. Also see this.

The sordid history of the likes of SAIC and Lockheed raises the topic/straw-man of whether DARPA PMs, in comparison to private sector research managers who can contract with a few phone calls and a short SOW, “must” be hemmed in by mountains of FARs and bureaucracy lest they be irresponsible with funds from the public purse. Yet these same managers routinely are expected—all but required thanks to the legions of Armani-shod lobbyists—to dole out billions to outfits like SAIC and Lockheed which have long—like really, really long—rap sheets on totally wasting public moneys. Sad.

13. Six coffee shops and counting.

14. Okay, your typical tech middle manager is also paid to knock back vodka shots in strip clubs while exchanging pointers on how to evade HR’s efforts to reduce sexual harassment, a phenomenon I have explored in greater detail here.

15. See Sharon Weinberger’s superbly researched history of DARPA, Imagineers of War, for further discussion, particularly her analysis of DARPA’s seemingly terminal intellectual and technical drift in the post-Cold-War period.

16: Academic computer science has basically run itself into a publications cul-de-sac—mind you, possibly quite deliberately, as said cul-de-sac guarantees their faculty can spend virtually all of their time working on their start-ups and consultancies—where publication has become defined solely by the ability to get some marginal increase in standardized metrics on standardized data sets.

Vignette: I’m generally avoiding reviewing journal articles now—I have only limited access to paywalled journals, and in any case don’t want to encourage that sort of thing—but a few weeks ago finally agreed to do so (for a proprietary journal I’d never heard of) after being incessantly harangued by an editor, presumably because I was one of about five people in the world who had worked in the past with all of the technologies used in the paper, and I decided to reward the effort that must have been involved to establish this connection. The domain, of course, was forecasting political conflict, and the authors had assembled a conflict time series from the usual suspects—ACLED, Cline Center, or ICEWS—and applied four different ML methods, which produced modestly decent results—as computer scientists, they felt no obligation, of course, to look at the existing literature which extends back a mere four or five decades—with a bit of variation in the usual metrics, probably F1, AUC and MAE. There was a serious discussion of these differences, discussions of the relative level of resources required for each estimator, blah blah blah. So far, a typical ML paper.

Until I got to a graphical display of the results. The conflict time series, of course, was a complex saw-toothed sequence. Every single one of the ML “models”: a constant! THE [fuckwit] IDIOTS HAD SUBMITTED A PAPER WHERE THE PREDICTED TIME SERIES HAD ZERO VARIANCE! And those various estimators didn’t even converge to the mean, hence the differences in the measures of fit!

I politely told the editor, in all sincerity, that this was the stupidest thing I had ever read in my life, and in political science it would have never gone out for review. The somewhat apologetic response allowed that it might not be the finest contribution from the field, as the journal was new (and, I’m sure, expensive: gotta make the percentage of library budgets that go to serials asymptotic to 100%!) and was being submitted for a special issue. Right.

After completing the review, I tracked down the piece (I follow the political science norm of respecting double-blind review processes): it was from one of the top computer science shops in the country, and the final “author” (who I presume had never even glanced at the piece) was the director of a research institute with vast levels of government funding. Such is the degeneracy of contemporary academic computer science. I’m hardly the only person to notice this issue: see this from Science.

17. This, of course, being the dominant issue in political-economy for the first half of the 21st century: the Chinese have created a highly economically successful competitor to liberal market polities, and we have also seen a convergence in market concentration in the new companies dominating the heights of markets in both systems. However, we’ve got 200 years of theorizing—once called “conservative” (and before that “liberal”) in the era before “conservative” became equated with following the constantly changing whims of a deranged maniac—arguing that decentralized economic political-economic systems should provide long-term advantages over authoritarian systems. But that sure the heck isn’t clear at the moment.

18: Hegel , of course, had similar ideas 200 years ago, but wasn’t very good with PowerPoint™: sad.

19. Do these elaborate proposal preparation shops figure into the high indirect costs of the established contractors? Nah…of course not, because we know proposals are done by legions of proposal fairies who subsist purely on dewdrops and sunlight, costing nothing. Or if they did, those costs would be reimbursed by the legions of proposal leprechauns and their endless supplies of gold. None of this ever figures into indirect cost rates, right?

20. As distinct from providing 200+PowerPoint™ slide decks for day-long monthly program reviews: they’ll be great on that as well!

21. Turns out astrophysics has a wonderful name for the undocumented code people write figuring no one will ever look at it only to find it’s still in use twenty years later: “dinosource.”

Posted in Higher Education, Methodology | 1 Comment

What if a few grad programs were run for the benefit of the graduate students?

I’ve got a note in my calendar around the beginning of August—I was presumably in a really bad mood at [at least] some point over the past year—to retweet a link to my blog post discussing my fondness for math camps—not!—but in the hazy-lazy-crazy days of summer, I’m realizing this would be rather like sending Donald Trump to meet with the leaders of U.S. allies: gratuitously cruel and largely meaningless. Instead, and more productively, an article in Science brought to my attention a recent report [1] by the U.S. National Academies of Sciences, Engineering, and Medicine (NASEM)—these people, please note, are a really big deal. The title of the article—”Student-centered, modernized graduate STEM education”—provides the gist but here’s a bit more detail from the summary of the report provided in the Science article:

[the report] lays out a vision of an ideal modern graduate education in any STEM field and a comprehensive plan to achieve that vision. The report emphasizes core competencies that all students should acquire, a rebalancing of incentives to better reward faculty teaching and mentoring of students, increased empowerment of graduate students, and the need for the system to better monitor and adapt to changing conditions over time.  … [in most institutions] graduate students are still too often seen as being primarily sources of inexpensive skilled labor for teaching undergraduates and for performing research. …  [and while] most students now pursue nonacademic careers, many institutions train them, basically, in the same way that they have for 100 years, to become academic researchers

Wow: reconfigure graduate programs not only for the 21st century but to benefit the students rather than the institutions. What…a…concept!

At this point my readership now splits, those who have never been graduate students (a fairly small minority, I’m guessing) saying “What?!? Do you mean graduate programs aren’t run for the benefit of their students???” while everyone who has done time in graduate school is rolling their eyes and cynically saying “Yeah, right…” With the remainder rolling on the ground in uncontrollable hysterical laughter.[2]

But purely for the sake of argument, and because these are the lazy-hazy-crazy days of summer, and PolMeth is this week and I got my [application-focused!] paper finished on Friday (!!), let’s just play this out for a bit, at least as it applies to political methodology, the NAESM report being focused on STEM, and political methodology is most decidedly STEM. And in particular, given the continued abysmal—and worsening [3]—record for placement into tenure-track jobs in political science, let’s speculate for a bit what a teaching-centered graduate level program for methodologists, a.k.a. data scientists, intending to work outside of academia might look like. For once, I will return to my old framework of seven primary points:

1. It will basically look like a political methodology program

I wrote extensively on this topic about a year ago, taking as my starting point that experience in analyzing the heterogeneous and thoroughly sucky sorts of data quantitative political scientists routinely confront is absolutely ideal training for private sector “data science.” The only new observation I’d add, having sat through demonstrations of several absolutely horrible data “dashboards” in recent months, is formal training in UX—user interface/experience—in addition to the data visualization component. So while allowing some specialization, we’d basically want a program evenly split between the four skill domains of a data scientist:

  • computer programming and data wrangling
  • statistics
  • machine learning
  • data visualization and UX

2. Sophisticated problem-based approaches taught by instructors fully committed to teaching

One of the reasons I decided to leave academia was my increasing exposure to really good teaching methodologies combined with a realization that I had neither the time, energy, nor inclination to use these. “Sage on the stage” doesn’t cut it anymore, particularly in STEM.

Indeed, I’m too decrepit to do this sort of thing—leave me alone and just let me code (and, well, blog: I see from WordPress this is published blog post #50!)—but there are plenty of people who can enthusiastically do it and do it very well. The problem, as the NASEM report notes in some detail, is that in most graduate programs there are few if any rewards for doing so. But that’s an institutional issue, not an issue of the total lack of humans capable of doing the task, nor the absence of a reasonably decent body of research and best-practices—if periodically susceptible, like most everything social, to fads—on how to do it.

3. Real world problems solved using remote teaming

Toy problems and standardized data sets are fine for [some] instruction and [some] incremental journal publications, but if you want training applicable to the private sector, you need to be working with raw data that is [mostly] complete crap, digital offal requiring hours of tedious prep work before you can start applying glitzy new methods to it. Because that, buckeroos, is what data science in the private sector involves itself with, and that’s what pays the bills. Complete crap is, however, fairly difficult to simulate, so much better to find some real problems where you’ve got access to the raw data: associations with companies—the sorts of arrangements that are routine in engineering programs—will presumably help here, and as I’ve noted before, “data science” is really a form of engineering, not science. 

My relatively new suggestion is for these programs to establish links so that problem-solving can be done in teams working remotely. Attractive as the graduate student bullpen experience may be, it isn’t available once you leave a graduate program, and increasingly, it will not be duplicated in many of the best jobs that are available, as these are now done using temporary geographically decentralized teams. So get students accustomed to working with individuals they’ve never met in person who are a thousand or eight thousand or twelve thousand miles away and have funny accents and the video conferencing doesn’t always work but who nonetheless can be really effective partners. In the absence of some dramatic change in the economics and culture of data science, the future is going to look like the “fully-distributed team” approach of parse.ly , not the corporate headquarters gigantism of FAANG.

4. One or two courses on basic business skills

I’ve written a number of blog entries on the basics of self-employment—see here and here  and here—and for more information, read everything Paul Graham has ever written, and more prosaically, my neighbor and tech recruiter Ron Duplain always has a lot of smart stuff to say, but I’ll briefly reiterate a couple of core points here.

[Update 31 July: Also see the very useful EuroPython presentation from Ines Montani of explosion.ai, the great folks that brought you spaCy and prodigy. [9]]

Outside of MBA programs—which of course go to the opposite extreme—academic programs tend to treat anything related to business—beyond, of course, reconfiguring their curricula to satisfy the funding agendas of right-wing billionaires—as suspect at best and more generally utterly worthy of contempt. Practical knowledge of business methods also varies widely within academia: while the stereotype of the academic coddled by a dissertation-to-retirement bureaucracy handling their every need is undoubtedly true as the median case, I’ve known more than a few academics who are, effectively, running companies—they generally call them “labs”—of sometimes quite significant size.

You can pick up relevant business training—well, sort of—from selectively reading books and magazine articles but, as with computer programming, I suspect there are advantages to doing this systematically [and some of my friends who are accountants would definitely prefer if more people learned business methods more systematically]. And my pet peeve, of course, is getting people away from the expectations of the pervasive “start-up porn”: if you are reasonably sane, your objective should be not to create a “unicorn” but rather a stable and sustainable business (or set of business relationships) where you are compensated at roughly the level of your marginal economic contribution to the enterprise.[4]

That said, the business angle in data analytics is at present a rapidly moving target as the the transition to the predominance of remote work—or if you prefer, “gig-economy”—plays out. In the past couple of weeks, there were articles on this transition in both The Economist’s “The World If…” feature and Science magazine’s “Science Careers” [6 July 2018][5]. But as The Economist makes clear, we’re not there yet, and things could play out in a number of different ways.[6] Still, it is likely that most people in the software development and data analytics fields should probably at least plan for the contingency they will not be spending their careers as coddled corporate drones and instead will find themselves in one of those “you only eat what you—or you and your ten-person foraging party of equals—kill” environments. Where some of us thrive. Grrrrrrrr.  There are probably some great market niches for programs that can figure out what needs to be covered here and how to effectively teach it. 

5. Publication only in open-access, contemporaneous venues

Not paywalled journals. Particularly not paywalled journals with three to five year publication lags. As I note in one of several snarky asides in my PolMeth XXXV paper

Paywalled journals are virtually inaccessible outside universities so by publishing in these venues you might as well be burying your intellectual efforts beneath a glowing pile of nuclear waste somewhere in Antarctica. [italics in original]

Ideally, if a few of these student-centered programs get going, some university-sponsored open access servers could be established to get around the current proliferation of bogus open access sites: this is certainly going to happen sooner or later, so let’s try “sooner.” Bonus points: such papers can only be written using materials available from open access sources, since the minute you lose your university computer account, that’s the world you will live in.

It goes without saying that students in these programs should establish a track record of both individual and collective code on GitHub. GitHub (and StackOverflow) having already solved the open access collective action problem in the software domain.[7] 

6. Yes, you can still use these students as GTAs and GRAs provided you compensate them fairly

Okay, I was in academia long enough to understand the basic business model of generating large amounts of tuition credit hours—typically about half—in massive introductory classes staffed largely by graduate students. I was also in academia long enough to know that graduate training is not required for students to be able to competently handle that material: You just need smart people (the material, remember, is introductory) and, ideally, some training and supervision/feedback on teaching methods. To the extent that student-centered graduate programs have at least some faculty strongly committed to teaching rather than increasing the revenues of predatory publishers you may find MA-level students are actually better GTAs than research-oriented PhD students.

As far providing GRAs, my guess is that generating basic research—open access, please—out of such programs will also occur naturally and again, with because the programs have a focus on applications these students may prove better (or at least, less distracted) than those focused on the desperate—and in political science, for three-quarters, inevitably futile—quest for a tenure-track position. You might even be able to get them to document their code!

In either role, however, please provide those students with full tuition, a living wage and decent benefits, eh? The first law of parasitism being, of course, “don’t kill the host.” If that doesn’t scare you, perhaps the law of karma will.

7. Open, transparent, unambiguous, and externally audited outcomes assessments

Face it, consumers have more reliable information on the contents of a $1.48 can of cat food than they have on the outcomes of $100,000 business and law school programs, and the information on professional programs is usually far better than the information on almost all graduate programs in the social sciences. In a student-centered program, that has to change, lest we find, well, programs oriented towards training for jobs that only a quarter of their graduates have any chance of getting.

In addition to figuring out standards and establishing record-keeping norms, making such information available is going to require quite the sea change in attitudes, and thus far deans, associate deans, assistant deans, deanlets, and deanlings have successfully resisted open accountability by using their cartel powers.[8] In an ideal world, however, one would think that market mechanisms would favor a set of programs with transparent and reliable accountability.

Well, a guy can dream, eh?

See y’all—well, some subset of y’all—in Provo.

Footnotes

1. Paywalled, of course. Because elite not-for-profit organizations sustained almost entirely by a combination of tax monies and grants from sources who are themselves tax-exempt couldn’t possibly be expected to make their work accessible, particularly since the marginal cost of doing so is roughly zero.

2. What’s that old joke from the experimental sciences?: if you’re embarking on some procedure with potentially painful consequences, better to use graduate students rather than laboratory rats because people are less likely to be emotionally attached to graduate students.

3. The record for tenure track placement has gotten even worse, down to 26.3%, which the APSA notes “is the lowest reported figure since systematic observation began in the 2009-2010 academic year.” 

4. Or if you want to try for the unicorn startup—which is to say, you are a white male from one of a half-dozen elite universities—you at least understand what you are getting into, along with the probabilities of success—which make the odds of a tenure-track job in political science look golden in comparison—and the actual consequences, in particular the tax consequences, of failure. If you are not a white male from one of a half-dozen elite universities, don’t even think about it.

5. Science would do well to hire a few remote workers to get their web page functioning again, as I’m finding it all but inoperable at the moment. Science is probably spending a bit too much of their efforts breathlessly documenting a project which using a mere 1000 co-authors has detected a single 4-billion-year-old neutrino.

6. And for what it’s worth, this is a place where Brett Kavanaugh could be writing a lot of important opinions. Like maybe decisions which result in throwing out the vast kruft of gratuitous licensing requirements that have accumulated—disproportionately in GOP-controlled states—solely for the benefit of generally bogus occupational schools.

7. And recently received a mere $7.5-billion from Microsoft for their troubles: damn hippies and open source, never’ll amount to anything!

8. Though speaking of cartels—and graduate higher education puts OPEC, though not the American Medical Association, to shame on this dimension—the whole point of a cartel is to restrict supply. So a properly functioning cartel should not find itself in a position of over-producing by a factor of three (2015-2016 APSA placements) or four (2016-2017 placements). Oh…principal-agent problems…yeah, that…never mind…

9. Watch the presentation, but for a quick summary, her main point is that the increasingly popular notion that a successful company has to be large, loss-making, and massively funded is bullshit: if you actually know what you are doing, and are producing something people want to buy, you can be self-financing and profitable pretty much from the get-go. “Winner-take-all” markets are only a small part of the available opportunities—though you wouldn’t know that from the emphasis on network effects and FOMO in start-up porn, now amplified by the suckers [10] who pursue the opportunities in data science tournaments rather than the discipline of real markets—and there are plenty of possibilities out there for small, complementary teams who create well-designed, right-sized software for markets they understand. Thanks to Andy Halterman for the pointer.

10. Okay, “suckers” is probably too strong a word: more likely these are mostly people—okay, bros—who already have the luxury of an elite background and an ample safety net provided by daddy’s and mommy’s upper 1% income and social networks so they can afford to blow off a couple years doing tournaments just for the experience. But compare, e.g. to Steve Wozniak and Steven Jobs—and to a large extent, even with their top-1% backgrounds, Bill Gates and Paul Allen—who created things people actually wanted to buy, not just burning through billions to manipulate markets (Uber, and increasingly it appears, Tesla)

Posted in Higher Education, Methodology | Leave a comment

Witnessing a paradigm shift?

The philosopher of science Thomas Kuhn is famous—beyond an apparent penchant for throwing ashtrays [1]—for his vastly over-generalized concept of “paradigm shifts” in scientific understanding, where a set of ideas once thought unreasonable becomes the norm, exchanging this status with ideas on the same topic once almost universally accepted. [2] This typically involves a generational change—Max Planck famously observed that scientific progress occurs one funeral at a time —but can sometimes occur more quickly. And I think I’m watching one develop in the field of predictive models of conflict behavior.

The context here [3] was a recent workshop I attended in Europe on that topic. The details don’t matter but suffice it to say this involved an even mix of the usual suspects in quantitative conflict data and modeling—I’m realizing there are perhaps fifty of us in the world—and an assortment of NGOs and IGOs, mostly consumers of the information. [4]  Held amid the monumental-brutalist architecture housing the pan-European bureaucracy, presumably the model for the imperial capital in The Hunger Games, leading one to sympathize, at least to a degree, with European populist movements. And by the way, in two days of discussions no one mentioned Donald Orange-mop even once: we’re past that.

The promised paradigm change is on the issue of whether technical models for forecasting conflict are even possible—and as I’ve argued vociferously in the past, academic political science completely missed the boat on this—and it looks as though we’ve suddenly gone from “that’s impossible!” to “okay, where’s the model, and how can we improve it?” This new assessment being entirely due to the popularization over the past year of machine learning. The change, even taking into account that the Political Instability Task Force has been doing just this sort of thing, and doing it well, for at least fifteen years, has been stunningly rapid.

Not, of course, without more than a few bumps along the way. Per the persistent hype around “deep learning,” there’s a strong assumption that “artificial intelligence” is now best done with neural networks—and the more complex the better—whereas there’s consistent evidence both from this workshop and a number of earlier efforts I’m familiar with that because of the heterogeneity of the cases and the tiny number of positives, random forests are substantially better. There’s also an assumption that you can’t figure out which variables are important in a machine learning model: again, wrong, as this is routine in random forests and can be done to a degree even in neural nets, though it’s rather computationally intensive. One presenter—who had clearly consumed a bit too much of the Tensorflow Kool-Aid—noted these systems “learn on their own”: alas, that’s not true for this problem [6] and in fact we need lots of training cases, and in conflict forecasting models the aforementioned heterogeneity and rare positives still hugely complicate estimation.

So these models are not easy, but they are now considered possible, and there is an actual emerging paradigm: In the course of an hour I saw presentations by a PhD student in a joint program at Universities of Stockholm and Iceland developing a resource-focused conflict forecasting model and a data scientist from the World Bank and FAO working on famine forecasting [7] both implementing essentially the same very complex protocols for training, calibration, and cross-validation of various machine learning models. [8][15]

Well, we live in interesting times.

There’s a fairly standard rule-of-thumb in economic history stating it takes between one and two human generations—20 to 40 years—to effectively incorporate a major new technology into the production structure of organizations. The—yes, paradigmatic—cases are the steam engine, electricity, and computers. [9] I’ve sensed for quite some time that we’re in this situation, perhaps half-way through the process, with respect to technical forecasting models and foreign policy decision-making. [10] As Tetlock and others have copiously demonstrated, the accuracy of human assessments in this field is very low, and as Kahneman and others have copiously demonstrated, decision-making on high-risk, low-probability issues is subject to systematic biases. Until quite recently, however, data [11] and computational constraints meant there were no better alternatives. But there are now, so the issue is how to properly use this information. 

And not every new technology takes a generation before it is adopted: to take some examples most readers will be familiar with, word-processing, MP3 music files, flat-screen displays, and cell phones displaced their earlier rivals almost in a historical eye-blink, albeit except for word processing this was largely in a personal rather than organizational context. In the long-ago research phase of ICEWS—a full ten years ago now, wow…—I had a clever slide (well, I thought it was clever) showing a robot saying “We bomb Mindanao in six hours” and a medal-bedecked general responding “Yes, master” to illustrate what technical forecasting models are not designed to do. But with accuracy 20% to 30% better than human forecasts, one would think these approaches should have some impact on the process. That is going to take time and effort to figure that out, particularly since human egos and status are involved, and the models will make mistakes. And present a new set of challenges, just as electrical power presents a different sets of risks and opportunities than the steam and water power it replaced. But their eventual incorporation into policy-making seems inevitable.

Finally, this might have implications for the future demand for event data, as models customized for very specific organizational needs finally provide a “killer app” using event data as a critical input. As it happens, no one has yet to come up with something that does the job of event data—recording day to day interactions of political actors as reported in the open press—without simply looking pretty much like plain old event data: Both the CAMEO and PLOVER [12] event coding systems still have the basic structure of the 60-year-old WEIS, because WEIS corporates most things in the news of interest to analysts (and their quantitative models). While the forecasting models I’m currently seeing primarily use annual (and state-level) structural data, as soon as one drops to the sub-annual level (and, increasingly, sub-state, as geocoding of event data improves) event data are really the only game in town. [13]

Footnotes

1. Recently back in the news…well, sort of…thanks to a thoroughly unflattering book by documentary film-maker Errol Morris, whose encounters with Kuhn when Morris was a graduate student left a traumatic impression of Kuhn being a horse’s ass of truly mythic proportions, though some have suggested parts of the book may themselves border on mythic…nonetheless, be civil to your grad students lest they become award winning film makers and/or MacArthur Award recipients long after you and any of your friends are around to defend your reputation. Well, and perhaps because being nice to your grad students is simply the right thing to do.

2. And thus the hitherto obscure word “paradigm” entered popular parlance: a number of years ago, at the height of the dot-com bubble, social philosopher David Barry proposed simply making up a company name, posting this on the web, and seeing how much money would pour in. The name he proposed was “Gerbildigm”, combining “gerbil” and “paradigm.” Mind you, that’s scarcely different than what actual companies were doing in the late 1990s to generate funding. Nowadays, in contrast, they simply say they are exploring applications of deep learning.

3. And by the way, this isn’t the snark-fest promised in the previous blog entry; that’s still to come, though events are so completely depressing at the moment—okay, “Christian” conservatives, you won the right not to bake damn wedding cakes, but at the price of suborning tearing infants out of the arms of their mothers: you really think that tradeoff is a good deal? Will your god? You’ve got an exemption from Matthew 25:35-40 now, eh? You’re completely confident about this? You sure?—I’m having difficulty gearing up for a snark-fest even though it is half-written. Though stuff I have half-written would fill a not-inconsequentially sized bookshelf.

4. It is also notable that the gender ratio at this very technical workshop was basically 50/50, and this included the individuals developing the data and models, not just the consumers. In the U.S., that ratio would have been 80/20 or even 90/10. So by chance is the USA excluding some very talented potential contributors to this field? [5] And is this related to the work of Jayhawk economist Donna Ginther, highlighted on multiple occasions by The Economist over the past few months, that in the academic discipline of economics, gender discrimination appears to be considered a feature rather than a bug? Which cascaded over into the academic field of political methodology, though thanks to the efforts of people like Janet Box-Steffensmeier, Sara Mitchell, Caroline Tolbert, and institutions like VIM is not as bad as it once was. But compared to my experiences in Europe, could still improve.

5. I recently stumbled onto historian Marie Hicks’s study titled Programmed Inequality: How Britain Discarded Women Technologists and Lost Its Edge in Computing.  Brogrammers take note: gender discrimination doesn’t necessarily have a happy ending.

6. Self-learning is, famously, possible for games like poker, chess and go, which have the further advantage that the average person can understand the application, thus providing ample fodder for breathless headlines, further leading to fears that our new Go-and-Texas-Hold’em neural network overlords will, like Daleks and Cylons, shortly lethally threaten us, even if they still can’t manage to control machines sufficiently well to align the doors to shut properly on a certain not-so-mass-produced electric vehicle produced by a company owned by one of the more notable alarmists concerned about the dangers of machine intelligence. Plus there’s the little issue of control of the power cord. I digress.

7.  Amusingly, for the World Bank work, the analyst then has to run comparable regression models because that’s apparently the only thing the economists there understand. At the moment.

8. Nor was this the standard protocol for producing a regression model which, gentle reader, I would remind you has the following steps (as Adam Smith pointed out in 1776, for maximal efficiency, assemble a large team of co-authors with specialists doing each task!):

  1. Develop some novel but vaguely plausible “theory”
  2. Assemble a set of 25 or so variables from easily available data sets
  3. Run transformations and subsets of these, ideally using automated scripts to save thought and labor, until one or more combinations emerge where the p-values on your pet variables are ≤0.05. Justify any superfluous variables required to achieve this via collinearity—say, parakeets-per-capita—as “controls.” Bonus points for using some new variant of regression for which the data do not remotely satisfy the assumptions and which mangles the coefficients beyond any hope of credible interpretation. Avoid, at all costs, out-of-sample assessments of any form.
  4. Report this in a standardized social science format 35 ± 5 pages in length (with a 100-page web appendix) starting with an update of the literature review from your dissertation[s], copiously citing your friends and any likely reviewers, and interpreting the coefficients as though they were generated using OLS estimation. Make sure the “Discussion” and “Conclusions” sections essentially duplicate each other and the statistical tables.
  5. Publish in a proprietary journal which will appear in print after a lag of at least three years, firewalled and thus inaccessible to the policy community, but no one will ever look at it anyway. Though previously you will have presented the problem, methodology, and results in approximately 500 seconds (you’re on a five paper panel, of course) at a major conference where your key slide will show 4 variants of the final 16-variable model with the coefficients to 6 decimal places and several p-values reported as “0.000.” The five people in the audience would be unable to read the resulting 3-point type except they are browsing the conference program instead of listening; the discussant asks why you didn’t include four addition controls.
  6. PROFIT!

I jest. I wish.

9. In fact quite a few people have suggested that computers still aren’t being used to their full capacity in corporations because they would render many middle managers irrelevant, and these individuals, unlike Yorkshire handloom weavers, are in a position to resist their own displacement: The Economist had a nice essay to this effect a couple weeks ago.

10. The concept of a systematic foreign policy is, of course, at present quaintly anachronistic in the U.S., where foreign policy, such as it is, is made on the basis of wild whims and fantasies gleaned from a steady if highly selective diet of cable TV, combined with a severe case of dictator-envy and the at least arguable proposition that poutine constitutes a threat to national security. But ever the optimist I can imagine the U.S. returning to a more civilized approach somewhere in the future, just as Rome recovered from both Nero and Caligula. Also as noted, this workshop was in Europe, which has suddenly been incentivized to get serious about foreign policy.

11. This is an important caveat: the data are every bit as important as the methods, and for many remote geographical areas under high conflict risk, we probably still don’t have all the data we need, even though we have a lot more than we once did. But data is hard, and data can be very boring—certainly it’s not going to attract the headlines that a glitzy new game-playing or kitten-identifying machine learning application can attract, and at the moment this field is dependent on a large number of generally underfunded small projects, the long-term Scandinavian commitments to PRIO and the Uppsala UCDP being exceptions. In the U.S., the continued funding of the ICEWS event data is very tenuous and the NSF RIDIR event data funding runs out in February-2018…just saying…

12. Speaking of PLOVER, at yet another little workshop, I was asked about the painfully slow progress towards implementing PLOVER, and it occurred to me that it’s currently trying to cross a technological “valley of death” [14] where PLOVER, properly implemented, would be clearly superior to CAMEO, but CAMEO already exists, and there is abundant CAMEO data (and software for coding it) available for free, and existing models already do a reasonably good job of accommodating the problems of CAMEO. “Free and already available” is a serious advantage if your fundamental interest is the model, not the data: This is precisely why WEIS, despite being proposed as a first-approximation to what would certainly be far better approaches, was used for about 25 years and CAMEO, which wasn’t even intended as a general-purpose coding scheme, is heading towards the two-decade mark, despite well-known issues with both.

13. Though the other thing to watch here is the emerging availability of low-cost, and frequently updated, remote sensing data. The annualized NASA night-light data is already being used increasingly to provide sub-state information with high geographical precision, and new private sector data, as well as new versions of night-lights, are likely to be available at a far greater frequency.

14. Googling this phrase to get a clean citation, I see it has been used to mean about twenty different things, but the one I’m employing here is a common variant.

15. And while I’m on the topic of unsolicited advice to grad students, yet another vital professional skills they don’t teach you in graduate school is flying to Europe and being completely alert the day after you arrive. My formula:

  1. Sleep as much as you can on the overnight flight (sleeping on planes, ideally without alcohol, is another general skill)
  2. Take at most a one hour nap before sunset, and spend most of the rest of the time outside walking;
  3. Live on the East Coast
  4. Don’t change planes (or at least terminals) at Heathrow
Posted in Methodology | 1 Comment

Should an event coder be more like a baby?

Last evening, as is my wont, I was reading the current issue of Science [1]—nothing like a long article on, say, the latest findings on mantle convection beneath the Hawai’i hotspot to lull one to sleep—when an article titled “Basic Instincts: Some say AI needs to learn like a child” jolted me into one of those “OMG, this is exactly the issue I’ve been dealing with!” experiences.

That issue: whether there is any future to dictionary-based political event coders. Of late—welcome to my life, such as it is—I’ve been wrestling with whether to invest my time:

  • Writing a new coder based on universal dependency parsing and my mudflat proof-of-concept: seems like low-hanging fruit
  • Adapting an existing universal dependency coder (seems increasingly unlikely for an assortment of reasons)
  • Or just tossing the whole project since everybody—particularly every U.S. government funder—knows that dictionary-based coders are oh-so-1990s and from this point on everything will be done with machine learning (ML) classifiers

This article may tilt the scale back to the first option. At least for me.

The “baby” reference here and in the article comes from the almost irrefutable evidence that humans are born hard-wired to efficiently learn various skills, and probably the most complex of these is language. A normally developing human child picks up language, typically using sound, but sign language is learned with equal facility—and outside the United States, usually multiple languages, and keeps them distinct—at a phenomenal rate. Ask any three-year-old. And try to shut them up. Provide a chimpanzee with exactly the same stimuli—yes, the experiment has been tried, on multiple occasions—and they never achieve remotely similar abilities to that of humans.

However, there’s an attraction to ML classifiers in being, well, rather mindless. [2] But this comes with the [huge] problem of requiring an extraordinary number of labeled training cases, which we simply don’t have for event data, nor does anyone seem inclined to generate them, because that process is expensive and involves the recruitment, management, and, critically, successful retention of a large number well-trained human coders. [3] Consequently event data coding is in a totally different situation from ML problems where vast numbers of labeled cases are available, typically from the web at little expense.

It’s completely possible, of course, to generate essentially unlimited labelled event data cases from the existing coding systems, and it is certainly conceivable that the magic of neural networks (or your classifier of choice) will produce some wonderful generalization that cannot be obtained from those coders. Or, more likely, will produce one interesting generalization that we will then see repeated endlessly, much like the man-woman-king-queen example for word embeddings. But another possibility is the classifiers will just sloppily approximate what the dictionary-based systems are already doing.

And doing reasonably well because dictionary-based automated event coding has been around for more than a quarter century, and now benefits from a wide range of on-going developments throughout the field of computational natural language processing. As a consequence, those programs start with a lot of “instinct.” Consider just how much comes embedded in a contemporary system:

  • The language model of the parser, which is the result of thousands of hours of experimentation across multiple major NLP research projects across decades
  • In some systems, notably VRA-Reader, PETRARCH-2 and Raytheon/BBN’s ACCENT/Serif, an explicit language model for political events
  • Models of language subcomponents such as dates, locations, and named entities
  • Two decades of human-coded dictionary development from the KEDS and TABARI projects [4]
  • The WordNet synonym sets, again the product of thousands of hours of effort, which have been incorporated into those dictionaries
  • A variety of very large data sets such as rulers.org, CIA World Leaders and Wikipedia for named-entity resolution
  • Extensive idiomatic human translation by native speakers of the Spanish and Arabic dictionaries currently being produced by the NSF RIDIR event data project

Okay, people, I know that your neural networks are cool—like they are really, really cool, fabulously cool, in fact you can’t even begin to tell me how cool they are, even if a four-variable logit model matches their performance out-of-sample—but frankly, I’ve just presented you with a rather extensive list of things that the dictionary-based coders are already starting with but which the ML programs have to learn on their own. [5] 

So in practical terms, for example, the VRA-Reader coder from the 1990s—now lost, alas, because it was proprietary…sad…—provided 128 templates for the possible structure of a sentence describing a political event. JABARI in the early 2010s—now lost, alas, because it was proprietary, and was successfully targeted by a duplicitous competitor…sad…—gained an additional 15% accuracy over TABARI using a set of very specific tweaks dealing with idiosyncratic characteristics of political events (e.g. the fact that the Red Cross rarely if ever engages in armed attacks). A dictionary-based system knows from the beginning that if A meets with B, B met with A, but if A arrests B, B didn’t arrest A. More generally, the failure—in numerous attempts across decades—of generic event “triple” coding systems to compete in this space is almost certainly due to the fact that domain-specific information provides a very significant boost to performance.

Furthermore, the environment in which we are deploying dictionary-based coding programs is becoming increasingly friendly: In the 1990s KEDS and VRA-Reader only had the texts and small dictionaries to work with, and had to do this on very limited hardware. Contemporary systems, in contrast, have access to sophisticated parsers and huge dictionaries with hardware easily able to accommodate both. Continuing the childhood metaphor, this is the difference between riding a tricycle and riding a 20-speed bicycle. With an electric assist.

I don’t expect this simple metaphor to be the last word on the subject and I may, in the end, decide that classifiers are going to rule us all (and in any case, that seems to be pretty much where all of the funding is going at the moment anyway but if that’s the case, please, can’t someone, somewhere fund an open set of gold standard records??). But I’m also beginning to think dictionary based approaches—or more probably, a hybrid of dictionary and classifier approaches—are more than an anachronistic “damn those neural nets: young whippersnappers don’t appreciate what it was like hacking into Nexis from the law school library account via an acoustical modem for weeks every morning from 2 a.m. to 5 a.m. [6]…get off my lawn” but rather, given the remarkable resources we can now deploy on the problem, dictionary-based coding represents a hugely more efficient approach than learning by example.

Time (and experimentation) will tell.

Footnotes

1. Okay, so it was actually last week’s issue: I wait for the paper version to arrive and hope it doesn’t get too soaked in the mailbox. The Economist I read electronically as soon as it is available.

2. The article quotes in passing Oregon State CS professor Thomas Dietterich that “[academic] computer scientists…have an aversion to debugging complex code.” Yeah, tell me about it…followed closely by their aversion to following quality control practices that have been common in industry since the 1990s. I digress.

3. The relatively new prodigy software is certainly a far more efficient approach to doing this than many earlier alternatives—I’ve also written a simple low-footprint variant of its annotation functions here—but human annotation remains vastly more labor intensive than, say, downloading millions of labeled images of cats and dogs.

4. Which I’ve got pretty good empirical evidence still provide most of the verb patterns for all of the CAMEO-based coding systems…figuring out verb patterns used to generate any data where you know both the codings and the URL of the source text is relatively straightforward, at least for the frequent patterns.

5. The other fabulously cool recent application of deep learning, the ability to play Go at levels beyond that of the best human expert, depended on a closed environment with fixed rules: event data coding is nothing like this.

6. Not a joke: this is the KEDS project ca 1990.

Posted in Methodology, Programming | 3 Comments