Seven Conjectures on the State of Event Data

[This essay was originally prepared as a memo in advance of the “Workshop on the Future of the CAMEO Ontology”, Washington DC, 11 October 2016, said workshop leading to the new PLOVER specification. I had intended to post the memo to the blog as well, but, obviously, neglected to do so at the time. Better late than never, and I’ve subsequently updated it a bit. It gets rather technical in places and assumes a fairly high familiarity with event data coding and methods. Which is to say, most people encountering this blog will probably want to skip past this one.]

The purpose of this somewhat unorthodox and opinionated document [1] is to put on the table an assortment of issues dealing with event data that have been floating around over the past year in various emails, discussions over beer and the like. None of these observations are definitive: please note the word “conjecture”.

1. The world according to CAMEO will look pretty much the same using any automated event coder and any global news source

The graph below shows the CAMEO frequencies across its major categories (two-digit) using three different coders, PETRARCH 1 and 2 [2], and Raytheon/BBN’s ACCENT (from the ICEWS data available on Dataverse) for the year 2014. This also reflects two different news sources: the two PETRARCH cases are Lexis-Nexis; ICEWS/ACCENT is Factiva, though of course there’s a lot of overlap between those.cameo_compare





Basically, “CAMEO-World” looks pretty much the same whichever coder and news source you use: the between-coder variances are completely swamped by the between-category variances. What large differences we do see are probably due to changes in definitions: for example PETRARCH-2 uses a more expansive definition of “express intent to cooperate” (CAMEO 03) than PETRARCH-1; I’m guessing BBN/ACCENT did a bunch of focused development on IEDs and/or suicide bombings so has a very large spike in “Assault” (18) and they seem to have pretty much defined away the admittedly rather amorphous “Engage in material cooperation” (06).

I think this convergence is due to a combination of three factors:

  1. News source interest, particularly the tendency of news agencies (which all of the event data projects are now getting largely unfiltered) to always produce something, so if the only thing going on in some country on a given day is a sister-city cultural exchange, that will be reported  (hence the preponderance of events in the low categories). Also the age-old “when it bleeds, it leads” accounts for the spike on reports of violence (CAMEO categories 17, 18,19).
  1. In terms of the less frequent categories, the diversity of sources the event data community is using now—as opposed to the 1990s, when the only stories the KEDS and IDEA/PANDA projects coded were from Reuters, which is tightly edited—means that as you try to get more precise language models using parsing (ACCENT and PETRARCH-2), you start missing stories that are written in non-standard English that would be caught by looser systems (PETRARCH-1 and TABARI). Or at least this is true proportionally: on a case-by-case basis, ACCENT could well be getting a lot more stories than PETRARCH-2 (alas, without access to the corpus they are coding, I don’t know) but for whatever reason, once you look at proportions, nothing really changes except where there is a really concentrated effort (e.g. category 18), or changes in definitions (ACCENT on category 06; PETRARCH-2 on category 03).
  2. I’m guessing (again, we’d need the ICEWS corpus to check, and that is unavailable due to the usual IP constraints) all of the systems have similar performance in not coding sports stories, wedding announcements, recipes, etc:  I know PETRARCH-1 and PETRARCH-2 have about a 95% agreement on whether a story contains an event, but a much lower agreement on exactly what the event is. The various coding systems probably also have a fairly high agreement at least on the nation-state level of which actors are involved.

2. There is no point in coding an indicator unless it is reproducible, has utility, and can be coded from literal text

IMHO, a lot of the apparent disagreements within the event data community about coding of specific texts, as well as the differences between the coding systems more generally stem from trying to code things that either can’t be consistently coded at all—by human or automated systems—or which will never be used. We should really not try to code anything unless it satisfies the following criteria:

  • It can be consistently categorized by human coders on multiple projects working with material from multiple sources who are guided solely by the written documentation. I.e. no project-level “coding culture” or “I know it when I see it.”; also see the discussion below on how little we know about true human coding accuracy.
  • The coded indicators are useful to someone in some model (which probably also puts a lower bound on the frequency with which a code will be found in the news texts). In particular, CAMEO has over 200 categories but I don’t think I’ve ever seen a published analysis that doesn’t either collapse these into the two-digit top-level cue categories, or more frequently the even more general “quad” or “penta” categories (“verbal cooperation” etc.), or else pick out one or two very specific categories. [3]
  • It can be derived from the literal text of a story (or, ideally, sentence): the coding of the indicators should do not require background knowledge except for information explicitly embedded in the patterns, dictionaries, models or whatever ancillary information is used by the automated system. Ideally, this information should be available in open source files that can be examined by users of the data.

If an indicator satisfies those criteria, I think we usually will find we have the ability to create automated extractors/classifiers for it, and to do so without a lot of customized development: picking a number out of the air, one should be able to develop a coder/extractor using pre-existing code (and models or dictionaries, if needed) for at least 75% of the system.

3. There is a rapidly diminishing return on additional English-language news sources beyond the major international sources

Back in the 1990s, with the beginnings of the expansion of the availability of news sources in aggregators and on the Web, the KEDS project at the University of Kansas was finally able to start using some local English-language sources in addition to Reuters, where we’d done our initial development. We were very surprised to find that while these occasionally contributed new events, they did not do so uniformly, and in most instances, the international sources (Reuters and AFP at the time) actually gave us substantially more events, and event streams more consistent with what we’d expected to see (we were coding conflicts in the former Yugoslavia, eastern Mediterranean, and West Africa). This is probably due to the following

  1. The best “international” reporters and the best “local” reporters are typically the same people: the international agencies don’t hire some whiskey-soaked character from a Graham Greene novel to sit in the bar of a fleabag hotel near the national palace, but instead hire local “stringers” who are established journalists, often the best in the country and delighted to be paid in hard currency. [19]
  2. Even if they don’t have stringers in place, international sources will reprint salient local stories, and this is probably even more true now that most of those print sources have web pages.
  3. The local media sources are frequently owned by elites who do not like to report bad news (or spin their own alt-fact version of it), and/or are subject to explicit or implicit government censorship.
  4. Wire-service sourcing is usually anonymous, which substantially enhances the life expectancy of reporters in areas where local interests have been known to react violently to coverage they do not like.
  5. The English and reporting style in local papers often differs significantly from international style, so even when these local stories contain nuggets of relevant information, automated systems that have been trained on international sources—or are dependent on components so trained: the Stanford CoreNLP system was trained on a Wall Street Journal corpus—will not extract these correctly.

This is not to say that some selected local sources could not provide useful information, particularly if the automated extractor was explicitly trained to work with them. There is also quite a bit of evidence that in areas where a language other than English predominates, even among elites, non-English local sources may be very important: this is almost certainly true for Latin America and probably also true for parts of the Arab-speaking world. But generally “more is better” doesn’t work, or at least it doesn’t have the sort of payoff people originally expected.

4. “One-a-day” (OAD) duplicate filtering is a really bad idea, but so is the absence of any duplicate filtering

I’m happy to trash OAD filtering without fear of attack by its inventor because I invented it. To the extent it was ever invented: like most things in this field, it was “in the air” and pretty obvious in the 1990s, when we first started using it.

But for reasons I’ve recently become painfully aware of, and I’ve discussed in an assortment of papers over the past eighteen months (see for the most recent rendition), OAD amplifies, rather than attenuating, the inevitable coding errors found in any system, automated or manual.

Unfortunately, the alternative of not filtering duplicates carries a different set of issues. While those unfamiliar with international coverage typically assume that an article which occurs multiple times will be somehow “more important” than an article that appears only once (or a small number of times), my experience is that this is swamped by the effects of

  • The number of competing news stories on a given day: on a slow news day, even a very trivial story will get substantial replications; when there is a major competing story, events which otherwise would get lots of repetition will get limited mentions.
  • Urban and capital city bias. For example, when Boko Haram set off a car bomb in a market in Nigeria’s capital Abuja, the event generated in excess of 400 stories. Events of comparable magnitude in northeastern regional cities such as Maiduguri, Bui or Damaturu would get a dozen or so, if that. Coverage of terrorist attacks over the past year in Paris, Nice, Istanbul and Bangkok—if not Bowling Green—show similar patterns.
  • Type of event. Official meetings generate a lot of events. Car bombings generate a lot of events, particularly by sources such as Agence France Press (AFP) which broadcast frequent updates.[4] Protracted low level conflicts only generate events on slow news days and when a reporter is in the area. Low-level agreements generate very few events compared to their likely true frequency. “Routine” occurrences, by definition, generate no reports—they are not “newsworthy”—or generate these on an almost random basis.
  • Editorial policy: AFP updates very frequently; the New York Times typically summarizes events outside the US and Western Europe in a single story at the end of the day; Reuters and BBC are in between. Local sources generally are published only daily, but there are a lot of them.
  • Media fatigue: Unusual events—notably the outbreak of political instability or violence in a previously quiet area—get lots of repetitions. As the media become accustomed to the new pattern, stories drop off.[18] This probably could be modeled—it likely follows an exponential decay—but I’ve rarely seen this applied systematically.

So, what is to be done? IMHO, we need to do de-duplication at the level of the source texts, not at the level of the coded events. In fact, go beyond that and start by clustering stories, ideally run these through multiple coders—as noted earlier, I don’t think any of our existing coders are optimal for everything from a Reuters story written and edited by people trained at Oxford to a BBC radio transcript from a static-filled French radio report out of Goma, DRC and which is then quickly translated by a non-native speaker of either language—then base the coded events on those that occur frequently in that cluster of reports. Document clustering is one of the oldest applications in automated text analysis and there are methods that could be applied here.

5. Human inter-coder reliability is really bad on event data, and actually we don’t even know how bad it is.

We’ve got about fifty years of evidence that the human coding [5] on this material doesn’t have a particularly high correlation when you start, for example, comparing across projects, over time, and in the more ambiguous categories.[6] While the human coding projects typically started with coders at a 80% or 85% agreement at the end of their training (as measured by Kronbach’s-alpha, typically) [7], no one realistically believes that was maintained over time (“coding drift”) and across a large group of coders who, as the semester rolled on, are usually always on the verge of quitting. [8] And that is just within a single project.

The human-coded WEIS event data project [10] started out being coded by surfers [11] at UC Santa Barbara in the 1960s. During the 1980s WEIS was coded by Johns Hopkins SAIS graduate students working for CACI, and in Rodney Tomlinson’s final rendition of the project in the early 1990s [12], by undergraduate cadets at the U.S. Naval Academy. It defies belief that these disparate coding groups had 80% agreement, particularly when the canonical codebook for WEIS at the Inter-University Consortium for Social and Political Research was only about five (mimeographed) pages in length.

Cross-project correlations are probably more like 60% to 70% (if that) and, for example, a study of reliability on (I think [20]) some of the Uppsala (Sweden) Conflict Data Project conflict data a couple years ago found only 40% agreement on several variables, and 25% on one of them (which, obviously, must have been poorly defined).

The real kicker here is that because there is no commonly shared annotated corpus, we have no idea of what these accuracy rates actually are, nor measures of how widely these vary across event categories. The human-coded projects rarely published any figures beyond a cursory invocation of the 0.8 Kronbach’s-alpha for their newly-trained cohorts of human coders; the NSF-funded projects focusing on automated coding were simply not able to afford the huge cost of generating the large-scale samples of human-coded data required to get accurate measures, and various IP and corporate policy constraints have thus far precluded getting verifiable information on these measures on the proprietary coders.

6. Ten possible measures of coder accuracy

This isn’t a conjecture, just a point of reference. These are from

  1. Accuracy of the source actor code
  2. Accuracy of the source agent code
  3. Accuracy of the target actor code: note that this will likely be very different from the accuracy of the source, as the object of a complex verb phrase is more difficult to correctly identify than the subject of a sentence.
  4. Accuracy of the target agent code
  5. Accuracy of the event code
  6. Accuracy of the event quad code: verbal/material cooperation/conflict [13]
  7. Absolute deviation of the “Goldstein score” on the event code [14]
  8. False positives: event is coded when no event is actually present in the sentence
  9. False negatives: no event is coded despite one or more events in the sentence
  10. Global false negatives: an event occurs which is not coded in any of the multiple reports of the event

This list is by no means comprehensive, but it is a start.

7. If event data were a start-up, it would be poised for success

Antonio Garcia Martinez’s highly entertaining, if somewhat misogynistic, Chaos Monkeys: Obscene Fortune and Random Failure in Silicon Valley quotes a Silicon Valley rule-of-thumb that a successful start-up at the “unicorn” level—at least a temporary billion-dollar-plus valuation—can rely on only a single “miracle.” That is, a unicorn needs to solve only a single heretofore unsolved problem. So for Amazon (and Etsy), it was persuading people that nearly unlimited choice was better than being able to examine something before they bought it; for AirBNB, persuading amateurs to rent space to strangers; for DocuSign [21], realizing that signing documents was such a pain that you could attain a $3-billion valuation just by providing a credible alternative [22].  If your idea requires multiple miracles, you are doomed.[15]

In the production of event data, as of 2016, we have open source solutions—or at least can see the necessary technology in open source—to solve all of the following parts for the low-cost near-real-time provision of event data:

  • Near-real-time acquisition and coding of news reports for a global set of sources
  • Automated updating of actor dictionaries through named-entity-recognition/resolution algorithms and global sources such as Wikipedia, the European Commission’s open source JRC-Names database, CIA World Leaders and
  • Geolocation of texts using open gazetteers, famously and resolution systems such as the Open Event Data Alliance’s mordecai.
  • Inexpensive cloud based servers (and processors) and the lingua franca of Linux-based systems and software
  • Multiple automated coders (open source and proprietary) that probably well exceed the inter-coder agreement of multi-institution human coding teams

More generally, in the past ten years an entire open source software ecosystem has developed relevant to this problem (but typically in contexts far removed from event data): general-purpose parsers, named-entity-recognition/resolution systems, geolocation gazetteers and text-to-location algorithms, near-duplicate text detection methods, phrase-proximity (word2vec etc) and so forth

The remaining required miracle:

  • Automated generation of event models, patterns or dictionaries: that is, generating and updating software to handle new event categories and refine the performance on existing categories.

This last would also be far more easy if we had an open reference set of annotated texts, and even Garcia Martinez allows that things don’t require exactly one miracle. And we don’t need a unicorn (or a start-up): we just need something that is more robust and flexible than what we’ve got at the moment.

SO…what happened???

The main result of the workshop—which covered a lot of issues beyond those discussed here—was the decision to develop the PLOVER coding and data interchange specification, which basically simplifies CAMEO to focus on the levels of detail people actually use (the CAMEO cue categories with some minor modifications [16]), as well as providing a systematic means—“modes” and “contexts”—for accommodating politically-significant behaviors not incorporated into CAMEO such as natural disasters, legislative and electoral behavior, and cyber events. This is being coordinated by the Open Event Data Alliance and involves an assortment of stakeholders (okay, mostly the usual suspects) from academia, government and the private sector. John Beieler and I are writing a paper on PLOVER that will be presented at the European Political Science Association meetings in Milan in June, but in the meantime you can track various outputs of this project at A second effort, funded by the National Science Foundation, will be producing a really large—it is aiming for about 20,000 cases, in Spanish and Arabic as well as English—set of PLOVER-coded “gold standard cases” which will both clearly define the coding system [16] and also simplify the task of developing and evaluating coding programs. Exciting times.[24]


1. Unorthodox and opinionated for a workshop memo. Pretty routine for a blog.

2. The blue bar shows the count of codings where PETRARCH-1 and PETRARCH-2 produce the same result; despite the common name, they are essentially two distinct coders with distinct verb phrase dictionaries.

3. Typically with no attention as to whether these were really implemented in the dictionaries: I cringe when I see someone trying to use the “kidnapping” category in our data, as we never paid attention to this in our own work because it wasn’t relevant to our research questions.

4. I read a lot of car bomb stories:

5. When such things existed for event data: There really hasn’t been a major human coded project since Maryland’s GEDS event project shut down about 15 years ago. Also keep in mind that if one is generating on the order of two to four thousand events per day—the frequency of events in the ICEWS and Phoenix systems—human coding is completely out of the picture.

6. In some long-lost slide deck (or paper) from maybe five or ten years ago, I contrasted the requirements of human event data coding to research—this may have been out of Kahneman’s Thinking Fast and Slow—on what the human brain is naturally good at. The upshot is that it would be difficult to design a more difficult and tedious task for humans to do than event data coding.

7. Small specialized groups operating for a shorter period, of course, can sustain a higher agreement, but small groups cannot code large corpora.

9. In our long experience at Kansas, we found that even after the best selection and training we knew how to do, about a third of our coders—actually, people developing coding dictionaries, but that’s a similar set of tasks—would quit in the first few weeks, and another sixth by the end of the semester. A project currently underway at the University of Oklahoma is finding exactly the same thing.

10.The WEIS (World Events Interactions Survey) ontology, developed in the 1960s by Charles McClelland, formed the basis of CAMEO and was the de facto standard for DARPA projects from about 1965 to 2005.

11. Okay, “students” but at UCSB, particularly in the 1960s, that was the same thing.

12. Tomlinson actually wrote an entirely new, and more extensive, codebook for his implementation of WEIS, as well as adding a few minor categories and otherwise incrementally tweaking the system, much as we’ve been seeing happening to CAMEO. Just as CAMEO was a major re-boot of WEIS, PLOVER is intended to be a major modification of CAMEO, not merely a few incremental changes.

13. More recently, researchers have started pulling out the high-frequency (and hence low information) “Make public statement” and “Appeal” categories out of “verbal cooperation”, leading to a “pentacode” system. PLOVER drops these.

14. The “Goldstein scale” actually applies to WEIS, not CAMEO: the CAMEO scale typically referred to as “Goldstein” was actually an ad hoc effort around 2002 by a University of Kansas political science grad student named Uwe Reising, with some additional little tweaks by his advisor to accommodate later changes in CAMEO. Which is to say, a process about as random as that which was used to develop the original Goldstein scale by an assistant professor and a few buddies on a Friday afternoon in the basement of the political science department at the University of Southern California. Friends don’t let friends use event scales: Event data should be treated as counts.

15. Another of my favorite aphorisms from Garcia Martinez: “If you think your idea needs an NDA, you might as well tattoo ‘LOSER’ on your forehead to save people the trouble of talking to you. Truly original ideas in Silicon Valley aren’t copied: they require absolutely gargantuan efforts to get anyone to pay serious attention to them.” I’m guessing DocuSign went through this experience: it couldn’t possibly to worth billions of dollars.

16. To spare you the suspense, we eliminated the two purely verbal “comment” and “agree” categories, split “yield” into separate verbal and material categories, combined the two categories dealing with potentially lethal violence, and added a new category for various criminal behaviors. Much of the 3- and 4-digit detail is still retained in the “mode” variable, but this is optional. PLOVER also specifies an extensive JSON-based data interchange standard in hopes that we can get a common set of tools that will work across multiple data sets, rather than having to count fields in various tab-delimited formats.

17. CAMEO, in contrast, had only about 350 gold standard cases: these have been used to generate the initial cases for PLOVER and are available at the GitHub site.

18. For example, a recent UN report covering Afghanistan 2016 concluded there had been about 4,000 civilian casualties for the year. I would be very surprised if the major international news sources—which I monitor systematically for this area—got even 20% of these, and those covered were mostly major bombings in Kabul and a couple other major cities.

19. With which they may use to buy exported whiskey, but at least that’s not the only thing they do.

20. Because, of course, the article is paywalled. One can buy 24-hour access for a mere $42 and 30-day access for the bargain rate of $401. Worth every penny since, in my experience, the publisher’s editing probably involved moving three commas in the bibliography, and insisting that the abstract be so abbreviated one needs to buy the article.

21. The original example here was Uber, until I read this. Which you should as well. Then #DeleteUber. This is the same company, of course, where just a couple years ago one of their senior executives was threatening a [coincidentally, of course…] female journalist. #DeleteUber. Really, people, this whole brogrammer culture has gotten totally out of control, on multiple dimensions.

Besides, conventional cabs can be, well, interesting: just last week I took a Yellow Cab from the Charlottesville airport around midnight, and the driver—from a family of twelve in Nelson County, Virginia, and sporting very impressive dreadlocks—was extolling his personal religious philosophy, which happened to coincide almost precisely with the core beliefs of 2nd-century Gnosticism. Which is apparently experiencing a revival in Nelson County: Irenaeus of Lyon would be, like, so unbelievably pissed off at this.

22. Arguably the miracle here was simply this insight, though presumably there is some really clever security technology behind the curtains. Never heard of DocuSign?: right, that’s because they not only had a good idea but they didn’t screw it up. Having purchased houses in distant cities both before and after DocuSign, I am inordinately fond of this company.

23. PLOVER isn’t the required “miracle” alluded to in item 7, but almost certainly will provide a better foundation (and motivation) for the additional work needed in order for that to occur. Like WEIS, CAMEO became a de facto “standard” by more or less accident—it was originally developed largely as an experiment in support of some quantitative studies of mediation—whereas PLOVER is explicitly intended as a sustainable (and extendible) standard. That sort of baseline should make it easier to justify the development of further general tools.

Posted in Methodology | Leave a comment

A Numerical Reflection upon the 2015-2016 APSA Placement Statistics

[Okay, this “Seven…” gimmick isn’t working for producing finished blogs—mind you, I’ve got about a dozen 50%-80% finished entries in the pipeline—and [shock!] there are things that can be said with less than “Seven…” witty subcategories, but are still longer than an extended set of Tweets, which no one reads to the end of anyway. [1] So I may be doing some shorter blog posts for a while.]

Brethren and sistern [2], our reading this day is from the newly released American Political Science Association 2015-2016 APSA Graduate Placement Survey. And more specifically, the chapters and verses—which is to say, the entire report—dealing with the continued decline in the proportion of political science PhDs who are placed in tenure-track (TT) positions. Now down to an abysmal 35.4%.

Well, at least that simplifies the task of the Director of Graduate Placement, eh?—he or she can just address the year’s crop of candidates with a straightforward “Look to your left and to your right: only one of you is going to get the TT job you have laboriously trained for.”

Otherwise it’s insane: why, oh why, are you people allowing this to continue? Have you no shame?

Let’s put this in perspective. I’m not sure what placement rates were when I graduated (Indiana, then as now ranked around 20 nationally, so competitive) in 1976—though we were complaining bitterly that they’d dropped from a rate approaching 100% in the previous decade—but I do know that I joined a department (Northwestern) which I’m pretty sure was composed entirely of TT faculty: I didn’t even really know what an “adjunct” was at the time. Pretty much the same was true twelve years later when I moved to Kansas, though I think there we had a couple long-term adjuncts teaching specialized courses in policy and law that we couldn’t otherwise staff. Mandatory retirement at age 65 was still in place both institutions, so someone hired into a TT slot would on average occupy that for about 35 years. [3]

And that, campers, is the source of our problem. Mandatory retirement was abolished in the US—with a few exceptions such as airline pilots and FBI agents, but not tenured academics—in 1986 [4]. Initial projections were that academics would retire anyway at age 70 or thereabouts but, well, from what I’m hearing, that isn’t happening. In fact I’m hearing quite the opposite: I talked recently with a chair who was extolling the virtues of her multiple faculty who were in their 80s.

Now, although I don’t think I really had the stamina (or cultural/emotional links to the average undergraduate student) to effectively engage a classroom using contemporary active learning methods after the age of 55 or so, these people have every legal right to do what they’re doing and I’m sure they have assured themselves this “aging” thing is just some sort of primordial myth from which they are exempted. Whatever. We’re here to look at comparative numbers.

Let’s assume the average person hired in a TT slot now occupies that position to age 80 rather than 65: 50 years rather than 35 years. In an eye-blink, we have just reduced the availability of TT slots by about 30%. Permanently: this is not a generational thing, it is a permanent structural change.

But it gets worse: not only will these people hold those positions an additional 15 years, but they will almost certainly do so at their career-high salary levels and, unless you’ve got deans, associate deans, deanlets and deanlings backed by a really good set of lawyers and an HR department which really likes to make phenomenal amounts of highly public trouble for itself, those individuals will generally get roughly the average departmental salary increase, but do so on the highest tier of the department salaries. Which will pretty much suck all of the financial oxygen out of the room, and at public universities at least, that additional salary probably will not be funded by tuition increases. [5]

Hence the proliferation of adjuncts, and the further decline of TT positions: public universities simply don’t have the money to fund such positions any more.

The effects of average salary increases (and salaries as a function of age) are much harder to approximate than the effects of abolishing mandatory retirement, but I’d guess this factor forces another 10% to 20% reduction in TT slots. And this before various other long-term trends such as student-as-customer, university-as-athletic-franchise, degree-program-as-occupational-training and campus-as-entertainment-park which further divert resources and reduce demand for dedicated lifetime-employed positions, particularly in the social sciences.

Now, there’s an obvious [roughly] market-clearing solution in these circumstances: reduce the production of PhDs—or at least PhDs aimed at TT employment [5]—by about 40% to 50%. Not only is that the obvious solution, it’s the only solution. I’ve seen almost no evidence this is happening, and fundamentally it is a classical tragedy-of-the-commons situation which has had the classical tragedy-of-the-commons outcome.

Good luck with that.


  1. Or at least not my extended tweets. Though eyeballing the Twitter stats, the drop-off is a great natural example of exponential decay.
  2. Yes, “sistern” is a word. In Middle English.
  3. That figure is almost certainly high: between voluntary and involuntary attrition, 30 years would probably be closer. But we’ll stay conservative.
  4. Abolition for tenured faculty in the US was actually delayed for an additional 8 years, to 1994, which is why we start seeing these effects kicking in around the 2000s rather than earlier
  5. At this same time, public financial support was also dropping precipitously compared to the levels during the post-WWII expansion of those systems, and tuition increases were hard-pressed to deal just with that decline in revenue. Declining public funding  accounts for substantial amounts, though probably not all, of the oft-heard ” tuition increases faster than inflation” complaint, at least in the 1990s and 2000s.
    Until the Great Chinese Trade War—or the Great Student Visa Crackdown—is in full swing, elite private schools appear to be able to raise tuition indefinitely. Such schools, of course, tend mostly to hire from each other (and a few top publics), though somewhat surprisingly the differences reported by the APSA between the top two NRC tiers (1-19 and 20-37) are surprisingly small, except for the top tier, as expected, being more likely to place in PhD programs. Below that, things get really grim.
  6. Ironically, I think there is a substantial unmet need outside of academia for individuals with advanced training in the social sciences, both quantitative and qualitative, that certainly goes beyond the typical two to four semester M.A., and in many cases would include the independent research experience required by a dissertation. At least on the quantitative side, such opportunities are probably absorbing most if not all of those unable to find TT employment. But there’s a huge waste in the current system: time spent training people on the assumption that they will spend their lives in the classroom and writing unread[able] articles published in paywalled venues following a five-year lag could be put to much better use on more practical topics. I’ve yet to see anyone do this in political science, though one sees some initial efforts in multi-disciplinary programs.

Addenda, 8 Feb 2017

Well, people seem to be reading this, so a few additional thoughts

1. Once again, my numbers on the impact of the end of mandatory retirement are just approximations, though probably not far off. I’m guessing most people won’t literally delay retirement until their 80s, but delay until mid-70s already seems to be quite common, so we’re still adding 10 years. Meanwhile I probably substantially over-estimated the average time spent prior to age 65 in a TT job, since attrition from the standard full-teaching-load TT position can occur through a variety of tracks, particularly administration (which in my experience is disproportionately drawn from the ranks of political scientists: during part of the period I was a grad student at Indiana, the president, provost and dean of liberal arts were all political scientists) or, as I did, shifting to buying out most courses with research funding. So the average time in a TT prior to age 65 may well be more like 25 years, though that and retirement at 75 leads you to pretty much the same numbers.

In the second half of my years at Kansas we had a highly effective provost, David Shulenburger who developed, along with his senior data analyst, Deb Teeter, an absolutely massive database on faculty and departmental costs and [multiple measures of] productivity. So I’m certain that Shulenburger and Teeter knew to the third decimal point the values of the sorts of numbers I’m speculating about here. Most faculty, of course, assumed Shulenburger allocated resources solely on the basis of personal whim, immensely hampered by his inability to recognize their intrinsic genius.

2. Barring various ailments, you can definitely live a “life of the mind” well past the age of 55: in fact with the Internet now instantly augmenting whatever lapses one has in memory, you can have a rather fabulous time doing this. What almost no one can do as well at age 55 as at age 35 is conduct a 3-hour active learning seminar, or, for that matter, pull an all-nighter to write a brand new lecture based on three books you read the previous day. Yeah, yeah, the Boomers reading this—other than the fact they won’t be—are all figuratively jumping up with counter-examples (most of which are urban legends) and yeah, yeah, at age 39 Tom Brady brought the Patriots back from near certain defeat to win the 2017 Super Bowl. But I’m guessing the NFL isn’t planning to shift its recruiting strategy to focus on 39-year-olds.

The other thing you unquestionably lose by age 55 is any sort of intellectual networking with the people who are developing the new cool theories: by 55, most of your peers from graduate school and pre-tenure days are no longer doing any meaningful research at all (or have shifted their focus to something like UFOs…), or have become administrators, or are running focused research projects where the work occurs largely out of the limelight. Want to see what Harry Potter felt like wearing that invisibility cloak?: walk through a conference venue when you are over the age of 50. Yes, this will happen to you (maniacal laughter…and get off my yard…)

[By the way, do not try to make these arguments in an academic department because your HR department will become exceedingly upset. I can say them because I’m no longer employed in academia.]

3. The sort of qualitative work I know of that is tremendously valued outside of academia is the intensely focused language/culture/field experience that is typical of much of comparative politics. Just as with quantitative training, the specific topic isn’t all that important, it’s the fact that you (and your immune system) can do it at all. I’m guessing there are culturally-immersive equivalents in U.S. politics, and there your choice of “shots” will be whiskey vs bourbon rather than typhoid vs typhus.

This doesn’t extend indefinitely: if by “qualitative” you mean simply sitting around in seminar rooms discussing ever more esoteric theories you have pretended to read, there’s a point where that is too, well, “academic” and no one else cares. The equivalent in the quantitative realm are people who spend their entire time studying estimators on simulated data: yawn… In academia there are niches for this sort of thing, just not very many, and your chances of getting placed into one are extremely low.

4. I’m guessing one of the primary motivators for the proliferation of PhD programs which have no realistic chance of placing most students into TT positions is to provide “graduate” TAs for the fabulously lucrative introductory courses, which basically drive most of the finances of large departments (certainly political science). This is actually a terrible model, and universities would be much better off using their best senior undergraduates in these roles, but you rarely see this. Nonetheless, I’ll always remember one of the honors students at Kansas who was working for me noting “I look at people in the honors program here, and they’re going off to graduate school at places like Harvard, Michigan and Berkeley. Whereas the GTAs here are going to graduate school at, well, KU.”

Posted in Higher Education | 2 Comments

Seven Observations on the 2016 Election

On my day-pack I’ve got a little enamelled pin I bought several years back in a small shop in Juneau run by a guy who has, well, opinions. It shows a typewriter with the words “Write hard, die free.” [1]

So, where we at? On 8 November, a majority of voters cast their ballot [2] for someone who has probably played a bit fast and loose with ethics [3], probably is a bit too loyal to subordinates, and unquestionably has very cozy ties with Wall Street. But thanks to very straightforward and completely transparent conditions involving the Electoral College which any 7th-grader—but apparently not the strategic geniuses of the Democratic National Committee (DNC)—can calculate, we will instead be inaugurating as President a vindictive, highly insecure misogynistic compulsive liar with authoritarian tendencies who has zero prior political experience and the attention span of a chihuahua.[4] May we live in interesting times.

My takeaways:

  1. This did not need to happen and is primarily the result of mind-boggling incompetence by the professionals of the Democratic Party.
  1. Fundamentally—consistent with pretty much everything everyone is saying—the election was lost by taking Rust Belt whites for granted [5]: without question, flip Wisconsin, Michigan, Ohio and Pennsylvania and Clinton would have been president. Everything else—Wikileaks, Comey, emails—was just gravy.
  1. The existing opinion and likely-voter models have been shown to be woefully inadequate, however much individuals making money off these will protest otherwise.

And a couple things we should keep in mind did not happen

  1. The country did not shift radically to the right: Trump did not even get a majority of the votes cast, much less of eligible voters.
  1. Trump is not a Republican in any conventional sense, though clearly the Republicans benefited from the Trump victory more than the Democrats did. Well, probably benefited more.
  1. There is a whole lot to still play out here.

Seven observations:

This was the victory of a populist third party with little clear ideology

Trump expertly fashioned himself to take advantage of the rising anti-elite, anti-globalization, and anti-immigrant populism we’ve been seeing in the US since the Tea Party successes in 2010, and surging in Europe for the past two or three years.[6] A couple of Republican tropes were tossed in—arch-conservatives on the Supreme Court, repeal of the Affordable Care Act and, of course, tax cuts primarily directed at the wealthy—but compared, say, to Paul Ryan, there was nothing like a coherent ideological package here. Furthermore Trump went through the election with at best tepid support from most of the Republican establishment, and fervent, vocal opposition from many in the highest levels of that establishment. Trump is the ultimate “RINO”: Republican in name only.

But now he has to govern, and here we are going to learn a lot in the next couple of months. Probably one of three scenarios will play out

  1. Bush-III: With Trump completely adrift ideologically and out of his depth, the GOP establishment under Ryan and McConnell (plus a bunch of Bush administration executive veterans, ideally minus those under suspicion of committing war crimes) control the executive branch appointment and eventually advance quite a bit of the same legislation that, say, Jeb Bush or Marco Rubio would have been able get through given the GOP control of the House and (barely) the Senate.[7]
  1. Gridlock-III/Chaos-I: There are sufficient Tea Party and Trumpkin votes (and, of course, Democrats) in the House to throw confusion into almost any initiative—the obvious first clash will occur on the deficit-increasing implications of infrastructure spending, defense increases, and ever-more tax cuts followed by clashes on the “replace” part of ACA “repeal and replace”—and things will settle down into either continuation of Obama-era gridlock or more likely a wild melange of initiatives going through what is essentially a three (or four) party legislature.[8] Some GOP priorities will get through, others will not, and some populist Democratic initiatives (infrastructure, definitely) will also.
  1. Hungary-II: An Alt-right circle of the likes of Bannon, Giuliani, Eric Trump, Kris Kobach, and the hordes of sycophants, opportunists and scoundrels descending upon the Trump Tower will actually start to implement the extreme elements promised during the campaign. Key indicator will be whether an attempt is made to prosecute Clinton.

At the time of this writing, 12 November, things are definitely heading towards Bush-III: the two lead headlines in the Washington Post at this moment are “Trump team backs off some sweeping campaign pledges” and “President-elect, aides suggest softer stances on border wall, health-care law.” But beneath that: “Meet the potential Cabinet picks most likely to make liberals squirm” and yes indeed, Cabinet positions for the wing-nuts of the right is a long GOP tradition. As my late father-in-law in Kansas would have put it, right now we’re looking at a hog on ice.

Again, it’s going to take several months to get a sense of how this will play out. The other factor which should become evident fairly soon is whether Trump expects to run for a second term: I very much doubt he will given that he will find the job exceedingly constraining, and he can also use this as a magnanimous example of voluntarily relinquishing the pursuit of power in order to serve the greater good.[9] The certainty of a one-term Trump presidency would substantially complicate the dynamics on the GOP side, at all levels.

The Democrats lost this one more than Trump won it

The breadth of the Democratic Party’s incompetence is this election is absolutely stunning. As noted above, this election was lost on traditional Democratic turf: the Rust Belt. And yet Clinton spent the final weeks trying to run up the score in Georgia (!) and Arizona, ignoring the lethal hemorrhage in Wisconsin, Michigan and Pennsylvania.

By all accounts the die had been cast long before this, first by the arrogance of the establishment elite who presumably had purchased The Party Decides in shipping-container quantities and displayed it in every office on little altars with flowers, incense and candles, and well, we’ve decided, and it will be Hillary. There was no consideration of credible alternatives, and when it was clear from the success of Sanders that Clinton had serious weaknesses, this was ignored. Because, as we all know, nothing is so attractive to younger and working class voters than a grey-haired 75-year-old socialist from Vermont, so Bernie is just a fluke. [10]  

This is professional malpractice on the highest order, and I hope Trump at least has sent you folks at the DNC a bundle of passes to his golf courses as an expression of his gratitude.

Almost all of the money given to candidates is squandered, so just stop sending it

Around the middle of October, as I was being bombarded by fund raising appeals (and the occasional phone call [11]) I watched as yet another Doctors Without Borders (MSF) [12] hospital was attacked (variously by allies of Obama and allies of Trump) and thought “This is it: no more money to these campaigns: from this point on it goes to MSF.”

Yeah, right. No, I didn’t follow my own advice (or moral compass) and still gave to candidates who, in the end, didn’t have a snowball’s chance in hell. Like in the blatantly gerrymandered VA-5, my home Congressional district, where I was constantly assured that the Democratic candidate from…duh…ultra-liberal Albemarle County “has a real chance!” She lost by 16 percentage points.[13]

I can, in fact, assert that every single candidate I contributed to—in some case quite significant amounts—in this election cycle lost. So if you are playing the prediction markets in 2018, be sure to give me a call since I’m got about as good a negative correlation on these things as you are going to find anywhere. Which, trust me, is the only way anyone is going to make any money off me in Campaign-2018.

But more generally, money doesn’t make that much difference: it’s not just the clowns at the DNC, it’s everywhere, and was every bit as evident in the GOP primary as in the general. We’ve known that for decades: it was probably twenty years ago I saw the first quantitative analyses at the Political Methodology meetings which showed how weak the effect was. Everyone thought there must be an error in the analysis but here we are, in 2016, and it’s the same old thing.

Basically, the money you contribute to a campaign goes two places. Primarily, to a huge class of ignorant hucksters whose sole concern is to use the emotions of the campaign to separate you from your cash, and will lie to your face to do so if that’s what it takes. Second, to the media entertainment complex for advertising created and targeted based on the advice of the hucksters and their massively flawed polls and focus groups. Money doesn’t generate votes when the fundamentals are wrong.

So my suggestion: the next time a candidate asks you for money, take what you were planning to spend, convert it to small bills, buy hot dogs or marshmallows, then pile the remaining money into a grill, spray it with lighter fluid, invite the neighbors over, and talk about common concerns while you enjoy the blaze. I can assure you that will do vastly more to influence votes than handing it the political consultant class.[14]

Next time, my maximum contribution is $20: if you can’t run a campaign on $20 contributions, don’t ask for my help. The rest goes to MSF. Which I’m still feeling guilty about: very brave people are dying in those places.

All election “news” is now merely for entertainment

I will give Jeff Bezos credit for one thing—beyond running the sort of business that fires people when they get diagnosed with cancer—he managed to get me clicking on those Washington Post stories throughout the day like a rat hitting the bar for more cocaine. And he’s got plenty of imitators. And I got totally suckered all the while knowing I was being suckered and that, folks, starts getting pretty scary.

And for what?: I’d skim these stories in the NYT and WaPo in the morning and basically knew the content of almost every one before I read them. I suppose I should give myself a bit of a break on a few rat-cocaine providers because I truly enjoy their writing—Gail Collins, Jennifer Rubin, Greg Sargent and Ross Douthat [15]. But most of this—and most certainly the “horse race” coverage of the polls—is utterly useless. [16]

Yeah, useless: there was essentially no discussion of actual policy during this entire race. No, it was all personality, gotcha’s, Wikileaks, the latest Trump outrages and the never-ending email story, which probably one in a ten-thousand voters (if that) understood at any serious level. And horse-race, horse-race, horse-race coverage, all based on countless polls which were all pretty much completely…

Wrong. Yes, the polls have systematic errors

Every methodology presentation on opinion polling I’ve attended for at least ten years has had the same theme: “We knew how to get pretty representative samples in the 1980s and maybe 1990s, but those methods no longer work, and at some point they are going to collapse. And it could happen at any time.”

Well, buckeroo, those chickens have come home to roost…

I’m not a pollster. I don’t even play one on TV.[17] But I’ve spent I great deal of time doing statistical forecasting with noisy time series and even before the catastrophic divergence of the poll projections and the outcome were evident, a couple of things were worrying me

  1. The individual polls were jumping around far too much to be explained by changes in voter opinions—which are generally fairly static—and certainly far too much to be explained by sampling error (which in a random sample is quite well understood)
  2. The media were treating the confidence intervals as if the true result were uniformly distributed therein: they are in fact following a bell-shaped “normal” (or “Gaussian”) distribution [18]
  3. I was seeing a whole lot of incoherent and inconsistent excuses suggesting that the pollsters themselves were pretty worried that they weren’t getting at the Trump voters, and for multiple reasons.

This analysis could go on indefinitely, and there are people who know way more about these things than I do, and they should get a workshop together to discuss this REALLY SOON (which is to say, before people internalize all their “we got this so totally wrong but we actually got it right” excuses—and believe me, that is going to happen unless people in polling are angels [26], and is already happening—and figure out what the systematic lessons-learned should be.

In the meantime, I wholeheartedly support the sentiment expressed by Timothy Egan:

Finally, all of us in the American family should never trust anyone from the pollster industrial complex, including those at my own newspaper. Never. Read your horoscope; it’s far more likely to be accurate.

Trump will find the federal government decidedly difficult to work with

When contemplating General Eisenhower winning the Presidential election, Harry S Truman said, “He’ll sit here, and he’ll say, ‘Do this! Do that!’ And nothing will happen. Poor Ike—it won’t be a bit like the Army. He’ll find it very frustrating.”
Source: Richard E. Neustadt, Presidential Power, the Politics of Leadership, p. 9 (1960).]

And this in reference to the [Kansan] Dwight Eisenhower who had successfully managed the massive bureaucracy required to pull off the successful D-Day invasion: Trump has no experience even remotely comparable. (the Miss Universe pageant doesn’t count)

Let us at least begin to list the ways this will happen

  • Trump starts totally in enemy territory: an astonishing 96% of the votes in the District of Columbia were against him
  • The Federal bureaucracy is fabulously slow-moving in the best of circumstances,[20] and most are protected by the Civil Service. He’s not able to just say “You’re fired!”
  • The House GOP will remain divided, possibly now into three parts: orthodox Republicans, the remnants of the Tea Party, and now a few Trumpkin populists

The scary thing, however, is the international system, which is almost certainly going to throw some serious crisis in the first couple of years, even if Trump were simply to maintain the status quo. To the extent that he initiates an isolationist policy, the possibility of this will be substantially magnified as various actors move quickly to take advantage of the emerging vacuum.

Distressed?: Consider the Benedict Option

I’m guessing most of the readers of this blog are liberals and will be unfamiliar with this term, though it will be familiar to at least some conservatives [21] Well, Google it, but in short, this is named for Benedict of Nursia (ca. 500CE) whose response to the collapse of the Western Roman Empire was to establish communities where Christians could live true to their own morals rather than involving themselves—as many Christians had since Christianity gained secular power following the time of Constantine—in the ways of the world. The arguments are somewhat more complex than that—a lot more when you start getting into the theological nuances—but worth reading (particularly for those who think conservatism begins and ends with the likes of O’Reilly, Hannity and Limbaugh)

In the past four days, we’re already seeing a lot of this as people make it clear that if need be they intend to protect their Muslim and immigrant neighbors, as well as whoever else the alt-right has in mind (and they make those targets pretty clear as well).[22] This safety-pin thing is wonderful.[23]

Maybe circling the wagons and pulling up the drawbridges won’t be necessary: Again, only about a quarter of eligible votes went to Trump, and by all accounts a substantial number of those were clearly simply to get someone in the White House who wouldn’t veto GOP-passed legislation and who would put Scalia-II on the Supreme Court. Those voters constantly say didn’t take the rest of Trump’s threats seriously.

They may be right. But they may also be wrong, or more likely, as with the Bush administration, we’ll see a mix of relatively innocuous people but also a few frighteningly cruel ones who will push things as far as they can. And for those, be ready to push back. Big time.

My upshot:

Come 2020 (or even 2018) I want to see candidates who understand the following realities

  1. Presidential elections are won state-by-state and the Electoral College is not fair. It wasn’t supposed to be fair—it was a concession to slave-owning states—and it isn’t.
  2. Voting suppression is also very real and until you get a Supreme Court that values small-d democracy—which obviously is not happening any time soon—you are going to have to add that into your calculations.
  3. Third parties will get votes: It’s hard to imagine weaker candidates than Johnson and Stein were this year but even they got votes.
  4. You can’t simply buy popularity: Clinton should have seen that from the experiences of Jeb Bush and Carly Fiorina.[24]  You can’t get turn-out without popularity and a coherent platform. The consultants, meanwhile, will be playing you like a violin.
    Have the radical idea that there’s more to winning an election than collecting money from people who can write $10,000 checks in the blink of an eye and then figuring people will go out and vote for you simply because the GOP has alienated them. And after you’ve assured them you are virtually certain to win anyway.
  5. The Roosevelt (and Bill Clinton, and classical Progressive) coalition included rural and working class whites: it wasn’t just coastal elites and minorities. If exit polls can be believed, Obama had a critically greater level of support from working class whites than Clinton attracted.
  6. A party controling only a third of state legislatures, a third of governorships, and slightly less than half of both the House and Senate is in trouble. Even if they control California.
  7. When you hire a pollster, ask them to explain how a confidence interval works. Not to estimate the number of golf balls to fit in a 747.

Wrapping this back to the opening key, I still think that the fundamental question facing the Democratic Party is whether they are willing to offer anything to the white working class, at least in the Rust Belt, and ideally across the country [25]. And no, this does not mean extending a Democratic big tent to David Duke, the KKK and the alt-right: they can stay with the GOP. Please.

What will it take to recreate an effective opposition? Here I think Trump (and certainly Sanders) may have done us a big favor by demonstrating just how little organization and resources are required to run a successful national campaign in the 21st century, even at the national level. The fabulously well-heeled fund raisers and fixers, the legions of lavishly compensated consultants and pollsters, the massive media buys, the star-studded galas: all for naught. Trump maybe, just maybe, has pointed the way to another model.

But whether or not that is the right model, we definitely need another model.

Back to work on Docker containerization.

Beyond the Snark

These references are by no means comprehensive, and some of them of have added after I first posted this—again, I think my analysis is fairly close to the consensus view of those outside the DNC bubble—but may nonetheless be useful

One way forward (Sanders):

Another way forward: (I’d also note this article was at the top of the NYT “most emailed” list for a couple of days. This approach to a new liberalism would also have, shall we say, “interesting” implications for most university curricula in the humanities. Or what remains of the humanities, per George Will )

And still more ideas, Establishment and otherwise:

Pretty much the same arguments I’m making with individual variations:

Debbie Dingell:

Frank Bruni:

Thomas Edsall:

Ross Douthat on the Trump presidency, pretty much a mix between my Bush-III and Gridlock-III. (love the phrase “TrumpWorks”)

This is not good news for the GOP as we knew it:

Polling error was systematic rather than random: (note that “systematic” is not the same as “deliberate”: every pollster who is not a partisan hack is mortified with these results. Besides, if this was deliberate—which I very much doubt—it almost certainly hurt Clinton by reducing Democratic turnout in pivotal states:these are not the conspiratorial manipulations that were being suggested prior to 8 November.)

Another one well worth reading: Am I in the sort of position to get a hospital bed made available? Been there, done that. That would make an interesting future survey question, along with the “Could you get $400 in an emergency” that had surprising results.


1. Though in some places that can translate into “Write hard, die young”: that’s what we’re trying to avoid here in Trump’s U.S., eh?

2. My understanding is that most analysts think that once all of the absentee and mail-in ballots are counted in a week or so, Clinton will have a very substantial lead in the popular vote, not just the tenths of a percentage point reported the night of the election.

3. One of the more disconcerting moments of 2016 was when I heard a mild-mannered mother of two in Nevada, an old friend of my wife’s, say: “They say Hillary Clinton murders people? Well, I certainly hope she’s murdered people: we need someone tough like that in the White House!” She lived in a predominantly Republican area—Michael Milken has a place down the street—and I’m guessing this was a useful ploy for diverting conversations otherwise going in unpleasant directions. But I’m not entirely sure she was kidding. And as numerous people have pointed out, the fact that Anthony Weiner is still alive is proof that the Clintons don’t have people killed.

4. What could possibly go wrong…

5. But why didn’t you say that earlier?? I did: Meanwhile I’ll be curious to see if Thomas Franks’ previously-panned Listen Liberal: Or Whatever Happened to the Party of the People gets some renewed attention.

5. Trump has done a major favor to dominant European parties by alerting them to the possibility of a major challenge within a party rather than just through the traditional parliamentary route of a third party challenge—though European party discipline would make this much more difficult than in the US—and reinforcing the realities of Russian meddling in elections, which clearly the US did not take seriously, and the US media actively assisted.

6. This is also Kansas-II as it will presumably lead to a soaring deficit while we wait for the Supply Side Fairy to sprinkle magic pixie dust on the budget and make it all okay. Just like with every other GOP administration since Reagan.

7. We saw this for many years in Kansas when the GOP legislators were so completely split into moderate and conservative blocs that they literally barely spoke to each other.

8. Also phrased as “When they pass around the plate of shit sandwiches, you can say ‘No thank you, I’ve already had my share.'”  A single-term also allows Trump to completely ignore his populist promises: those voters sure aren’t going to be showing up at his hotels and golf courses. At this moment—late afternoon on 12 November—we’re certainly watching the “So long, suckers!!” approach; this may or may not last.

9. But more generally, Trump has himself in quite a fix now: he’s riding the tiger in the spotlight of history and there is no easy way to get off. Whereas a week ago he would have probably been a mere amusing footnote, the black swan that didn’t occur.

10. They pulled the same trick in the primaries at the senatorial level in Pennsylvania, spending millions to defeat retired Admiral Joe Sestak, who had spent six years preparing to run against Toomey but who refused to kowtow to the party elites, only some of whom are convicted felons. The establishment’s pliable political newcomer sock puppet not only paved the way for Toomey’s return to office as part of a Republican majority, but provided none of the coattails to the working class Sestak would have provided.

11. Mind you, I did have the pleasure of absolutely reaming out some poor bastard from the Democratic Senatorial Campaign Committee, the jokers who whacked Sestak: guessing I’m no longer on their list of supporters to harass for contributions.

12. Médecins Sans Frontières (MSF) International:

13. More precisely, “Sixteen fucking percentage points”

14. Better, you could do this in a nearby state park where you might meet someone outside your bubble, though I’m guessing those who would have to work the better part of a week to make as much money as that check you’re blowing off on DNC consultants might be more than a little offended by the bonfire.

15. At the lower frequencies, Dan Drezner and Arthur Brooks; at the “where is this dude going to go next?” David Brooks; in small doses Paul Krugman.

16. An unusually vivid example of this was the persistence of reporting “Two way race” polling results over the last three months. WTF?!?: were you people planning to assassinate Gary Johnson and Jill Stein before the polls opened? Did you have premonitions Johnson would be struck dead by a meteorite and Stein mauled to death by a raccoon? In almost every state, it was a four-way race: your “two-way” race was a ludicrous hypothetical completely irrelevant to the real world.

17. Gratuitous Boomer reference

18. Okay, I’m lying: I have just incorrectly described how a confidence interval works because I have a marvelous description—gratuitous Fermat’s-Last-Theorem reference—but it is too small to fit in the main narrative of this blog. A confidence interval is in fact the interval estimated from your sample which would contain true value of the parameter [typically] 95% of the time were we able—welcome to the Mad Hatter’s Tea Party of frequentism—to replicate the experiment and estimate that interval from the resulting data a large number of times. Which we won’t, and in many cases, can’t. Which means…oh crap…just look it up… [19]

19. No, can’t help myself!…this is worse than Bezo’s WaPo…but even if the final observed value falls within your confidence interval, you haven’t shown your confidence interval is “correct,”despite the excuses you are seeing now from pollsters. To do that you’d need know the true parameter (which you don’t) and do some large number of replications of your estimator (which you won’t), and even then you would only have shown your method of estimating the confidence interval was correct in the sense of behaving in a fashion consistent with what you expect. Though whatever these problems, confidence intervals on regression coefficients are far worse… I digress…no, I don’t…FREQUENTISM MAKES NO SENSE!  IT NEVER MADE ANY SENSE!! JUST STOP DOING IT!!!…AAAEEEIIII!!!!…

20. I’m guessing that, ever the narcissist, The Donald is going to take a lot of standard operating procedure as a personal affront and this is going to become extremely wearisome for him over time. His more right-wing appointees will fare little better and probably will be facing active foot-dragging, particularly once it is clear—or widely assumed—Trump won’t run for re-election. And that is probably already being widely assumed.

21. Yes, I read your stuff when it is intellectually coherent, as distinct from conspiratorial rants employed to get people to tune in to watch advertisements for gold bars and ersatz tactical equipment purchased by folks who have a very high probability of dying in a comfortable hospital bed paid for by Medicare.

22. A nephew who is gay was in Washington last week, just after the election. “I’m going to visit the Holocaust Museum: want to see what is coming next.” Not sure whether he was joking.

23. Violent protests before Trump has even done anything?: less so. 24-hour drum circles: never.

24. Though one of the brightest bits of news in the past few days is the prospect of Carly Fiorina heading the RNC.

25. Which, by the way, probably also includes a significant number of Latinos and, against a less-overtly racially polarizing candidate, blacks, and all on the same economic issues: It’s class, not race, and remember that playing the race card against class is the oldest trick in the book for anti-progressive forces in the United States.

26. They certainly aren’t angels in my part of the prediction world: everybody now says they flawlessly predicted the collapse of the USSR and the Arab Spring, except for the inconveniently complete absence of evidence that this was true.

Posted in Politics | 5 Comments

Feral Plus Three

The usual apologies—or something like that—for the absence of entries of late but, well, I’ve been working, a lot, and since the Mission Statement of Parus Analytics—man, I can’t begin to tell you how many three-hour meetings, corporate retreat weekends with trust falls and consultants with really expensive business attire and hair stylists it took to settle on this! [1]—is

We’ve got a radical approach to software development: writing code that works, more or less on time, and for the price quoted.

work takes priority.

Still, this month marks the completed third year I’ve been “feral” and if that transition had been a serious mistake, I’d presumably know by now. Since this hasn’t happened, a “Feral Plus Three” seemed in order. And meanwhile I’m going to be participating in a panel at the Society for Political Methodology meetings on non-academic careers, so I’ll use this occasion to write down the [largely stunningly obvious] advice I’d give to someone who is contemplating following this path.

But first, the takeaway, a caveat, and the context.

The takeaway is that when I published Feral (27,000 views and still counting), I got several nice notes from people saying they had done the same thing and never regretted it. Which has been precisely my experience, along with an increasing sense that if I hadn’t done this, I would have missed out on some of the most interesting times of my life. Moreover, while for very real family reasons I could not done this earlier, probably the optimal time would have been somewhere in the 50 to 55 age bracket.[2] Carpe diem!

The caveat is that I’ve done this as a “data scientist.” Whatever the heck that is, but it seems to involve the same combination of social science expertise [3], computer programming, statistical analysis and machine learning [4] which previously defined me as an academic “political methodologist.” And at this point in history, the demand for data scientists appears to far exceed the supply, and provided you’ve got the requisite skills [5], one can set up shop as a data scientist with little more than a laptop and an internet connection. [6]

Finally, let’s be very clear that I’m discussing the prospects of establishing an on-going small consulting business intended to be sustained indefinitely, not a flash-and-burn start-up with aspirations to become a unicorn and for the principals to own—or at least lease—a Gulfstream G650 before the age of 30. The latter is for someone else’s essay, probably on LinkedIn.

No, I’m content to be one of the little mice down here in the weeds, not Bill Gates or Steve Jobs or Sergey Brin or Mark Zuckerberg or Peter Theil, but just part of one of the tens of thousands of largely invisible small shops that are driving the current technological revolution. [7]

So, seven suggestions on what should you consider and do to instantiate this.

1. Assess your resources and risk tolerance

Assuming you are leaving an existing secure career, before anything else, assess your financial situation [8] because compared to a tenured position in academia, you are moving into a much riskier situation.[9]

I’d planned the “feral” move for about two years, and was certain I had in place funds to make it to full retirement age even if those plans went really badly.[10] They didn’t—in fact I never touched a penny of those reserves, and have actually added to them—but the safety net was there. And like all safety nets, that allows you to take more risks, or at least turn down projects that don’t seem to make sense, and go through the inevitable period of experimentation that will be required before you really find the match between your skills, interests and the market.[11]

Yes, risk: that’s what you are moving towards. In the absence of unethical behavior [12] the downside income risk of an academic career is almost negligible. However, the upside risk is also very limited—less so if you can skitter around getting outside offers, though at some point that accumulates issues, to say nothing of bad karma—and more generally, even if you are fabulously successful in generating external funding in academia, most of those marginal benefits will go, for example, to topping up the salaries of your so-called colleagues whose core competence (and life’s ambition) is making your existence miserable. And someone has to pay for all those deans, associate deans, assistant deans, deanlings and deanlets.[13] For political methodologists, escaping this also means your summer salary will no longer depend on whether Senator Jeff Flake decides to pick up the red or blue light saber—and he is adept with both—when he gets up in the morning.

But if you currently have a secure job, be realistic about the risks: those positions aren’t called golden handcuffs for nothing. They really are handcuffs. And they really are golden.

2. Get thee to a tech hub

Contrary to what you are doubtlessly thinking, “tech hub” does not mean you are destined to life in a $4000/month one-bedroom efficiency somewhere within shaking distance of the San Andreas fault: In fact if you are planning to work for yourself or in a small group, that is probably the absolutely worst place you can establish yourself due to the cost of living.[14] Those same issues limit the attractiveness of other major metropolitan tech areas such as Boston, New York City and the greater Tysons Corner metroplex.

Instead, you need to find some place which has a thriving tech ecosystem, the sort of place where, say, on a summer evening you might find 50 people showing up for a talk on the Python natural language processing toolkits spaCy and gensim. Said talk held in a cavernous room in a re-purposed Coca Cola bottling plant which now houses a huge German-themed beer garden with a bocce ball court and [of course] a high-end bicycle shop. Which is to say, Charlottesville, Virginia.

Which is almost certainly not on the popular radar screen for tech hubs, but it is one. [14a] And there are certainly dozens, probably scores, of other places just like this, probably most in or near towns with major universities.

So, how do you find these. <joke>Here’s the really bad news:</joke> you look for places where people, particularly young people with substantial amounts of discretionary income, which is to say programmers, really want to live. Paul Graham has described this better than I, but that typically means a lot of nice restaurants, a thriving music scene, and access to outdoor recreation. And brewpubs. And of course ethnically diverse and LGBT-friendly. Yes, this is really tough set of lifestyle constraints, but such is the contemporary world of data science.

Now, some of my younger data science compatriots—and yes, they are mostly younger—go one step further and say that a place isn’t really a tech hub unless it provides a situation where you can quit your job in the morning and have another by the afternoon. And if you are the sort of person who is likely to be needing to do that on a regular basis, you’ll probably want some place larger, for both the opportunities and, most certainly, the anonymity.[15] But for the more occupationally adjusted, the lower costs and higher quality of life [16] in a smaller urban area probably outweigh the opportunities of a large one.

While you are looking, keep in mind that “business climate”—particularly for small business—is going to affect you, as I discussed earlier here.  Pennsylvania’s small business climate, of course, is uniformly horrible, unless you are fracking. In contrast, as long as you can avoid their massive license raj [17], Virginia is quite small business friendly, and the differences in the costs of compliance are measured in hundreds of dollars (your money) and hours of time and frustration. Charlottesville’s 50% discount on the business taxes for high-tech businesses, software development included, didn’t hurt either.

In all honesty, however, we made the decision to move to Charlottesville—we had checked out a number of cities in driving distance of the greater Tysons Corner metroplex—not on the basis of the number of small software companies or the Virginia small business website, but rather when we walked into a coffee shop the morning after we’d been to a concert by the Hackensaw Boys at the Jefferson Theater and saw they had “cortado” on the list of coffee drinks, this when few places in the U.S. had even heard of that drink. A more pleasant experience than picking a few institutions in various Rust Belt hellholes and isolated 19th century agricultural rail hubs that have posted positions on the APSA jobs site and hoping one will deign to interview you.

3. It’s a business, but…no big deal.

Assuming you’ve already been doing independent consulting (and hence are accustomed to keeping accounts, filing Schedule C and the like), the shift to being a small business full time is modest, and after some initial efforts (incorporating an LLC, getting a bank account, getting Affordable Care Act and business insurance, a logo, coffee mugs) it largely takes care of itself. You’ll hear a thousand arguments from people who have never worked outside a large bureaucracy why this is really, really scary but in fact, people do it all the time: I’ve explored this in boring detail here.

You will, however, also quickly learn that while United States popular culture glorifies small business, the United States business establishment—in particular banks—absolutely hate it, and by inference, hate you.[18] Fortunately you need very little capital to work as a data scientist.

And yes, Trump and Sanders are correct: the system is thoroughly rigged to favor large established institutions: for example it is estimated that it takes as long as ten years to become a prime contractor for the U.S. Dept of Defense. Never forget you are just a little mouse…but my, those dinosaur eggs are tasty.

For guidance on the nature of the contemporary tech business, start by reading everything Paul Graham has written.[19] Classics of computer programming management such as The Mythical Man-Month and The Psychology of Computer Programming will assure you that all of the weirdness you are encountering in projects is the norm. Avoid start-up porn, portrayals of small business from Hollywood, management books displayed in airports, the “networking-is-everything” losers who write for LinkedIn, and, in spades, people who give TedX talks.

4. Get an office: physical space matters

Well, it does to me: I inadvertently ended up working from home for a month or so when I started, and found there were far too many distractions—grab something from the kitchen, weed the garden, don’t take a shower until noon—and I actively want the home/work distinction. This indulgence has quite consistently cost me about $400 a month (seems to be the magic number for both State College and Charlottesville) but it is worth every penny.

None of these spaces have been in conventional office buildings: As you quickly will learn, lots of residential structures in urban centers have been converted to office space, and your co-occupants are likely to be lawyers, accountants, financial planners, and—particularly—therapists of many varieties.[20] These places are not necessarily advertised: use your social networks and walk around neighborhoods you’d like to be working in looking for “For Rent” signs in windows. In my experience, landlords like programmers: we’re quiet, don’t require parking for clients, and generally pay the rent on time.

Co-working has gotten a lot of press, but as a programmer-introvert, I found co-working space to be the spawn of Satan. I actually tried such an arrangement but realized after six months a lot of other people had looked at the option and no one else had taken it—if you can’t identify the biggest sucker in the room, it’s you—and meanwhile the guy managing the space had the affect of the doll in the Chucky movies, had fired everyone who was there when I’d first looked at the place and my desk was on the other side of some thin wallboard from a lawyer who would periodically go ballistic. I bailed—on the positive side, I was renting by the month—and now have a lovely space with big sunny windows a couple blocks away, the only slight issue being that I periodically find the microwave filled with peppermint-scented rocks.[21]

5. Your team. Or not.

Paul Graham [22] makes the case that the most efficient programming operation—and this would certainly apply to a data science operation—is about a dozen people [23] with a diverse set of skills who completely trust the quality of each other’s work and generally self-manage. He further argues that only in these organizations is your income likely to be pretty close to your true marginal contribution to the enterprise: anything larger and some of that income is lost to management infrastructure, and other parts are lost to equalization policies.[24]

That’s the ideal, rather than the route I’ve gone, which is nominally to work alone, though in reality I’m in almost daily email contact with one or more people in a geographically dispersed group of, well, about a dozen people with a diverse set of skills whose work I completely trust.

Were a suitable opportunity to present itself, I’ve no question that this cluster could work more efficiently were it focused on a single project and quite possibly (but not necessarily) in close physical proximity, but there is a core constraint one has to confront here: $100,000, which is roughly the amount of revenue you need to support every programmer.[25] That is a high hurdle, and while I’m risk acceptant to some degree, I haven’t gotten to that level yet.

6. Keep three or four projects going or in the works

Almost all programming projects are transient—the entire point of the enterprise is to get something running that the client can take over—so you need to keep new ones coming in. It’s too early to say whether “data science” projects will be different but I’m guessing at the levels where this has been out-sourced, that may also be the case, though it may not be. [26]

A skill you will need to develop if you haven’t already is being ruthlessly realistic about estimating the time and effort required to complete a project: this is not just with respect to the total amount of time required but also the point at which you need to start wrapping things up—data science projects tend to be very open-ended—so that you can finish things cleanly with good reports and documentation, and the latter tends to take a lot longer than you think. Most academics have a heck of a lot of unused time on their hands: you won’t, and rosy scenarios are not your friend.[27] And once again, remember that despite everything you read in the start-up porn, “fail early, fail often” only applies to white males from a tiny number of elite schools.

7. Learn php and javascript

Even if you are primarily working in R and Python, which you presumably will be. You may not need these other languages to build your own tools (though you may) but it will give you wizardly credibility: people—for example, your mother—may not know what data science is but they know what a web page is.

Beyond that, look forward to a constant effort of keeping your skill set up to date and trying to make those critical decisions as to which of the new technologies is worthy of your investment and which in a couple of years will have been relegated to the [unbelievably huge] scrapheap of technological history. You will live not in moldering library stacks and stifling seminar rooms [28], but by GitHub and Stack-Overflow, Slack and Google Hangout, Dropbox and Amazon Web Services. Open source and open access, always [29].

And then enjoy the wonder of a world where you can operate at the cutting edge of your profession using nothing but your wits and an investment in a professional-quality laptop.

To conclude:

Five out of my last seven political methodology students at Penn State have chosen to go into non-academic data science positions. Presumably that means their training at Penn State is either really bad, or really good. The fact that two of those placements were at Google and Apple, which have thousands of applicants for every position, I’m inclined to the latter interpretation.

Granting that I am an egotistical sonofabitch embedded in twenty-first century United States culture, the single biggest benefit of going feral is knowing that I’ve now gone for three years supporting myself in the same professional lifestyle I had when supported by a large institution but I’ve done it on my own: no one is hiring me because I’m affiliated with X. Priceless.

May the bridges you burn light your way

Beyond the Snark

Paul Graham’s blog:

Paul Graham’s Hackers and Painters : (hey, give the poor starving author a break: Quora says his net worth is only somewhere between $260M and $1.4B.)

Another recent take on non-academic tracks for political methodologists: Andrew Therriault Finding a Place in Political Data Science (PS, July 2016)

Joel Spolsky on why programmers need offices with walls:

Science (20 May 2016: Vol. 352, Issue 6288, pp. 899-901): Preprints for the life sciences

The Mythical Man-Month:

The Psychology of Computer Programming:

Why it’s better to let cougars kill pets and joggers than to allow deer to kill motorists (and eat hosta): (Bambis, we hates them forever)

Various previous mouse entries relevant to this topic:

Boring mechanics of setting up a business: The Mouse Goes Into Business [2]

Business climate: The Mouse Goes Into Business [1]:

Oligopolies: Seven reflections on Trump, Sanders and the crisis of bozo capitalism

Banks hate independent contractors: Mr. Bernanke’s mortgage:

“Going Feral”:Going Feral! Or “So long, and thanks for all the fish…”


Following up on “Feral”

42-page rant on what’s wrong with the contemporary academic system and why it isn’t going to last:


1. : Okay, so I can: none… Like you hadn’t guessed.

2. Even with those constraints, that’s about the point where I shifted from a conventional teaching track to a research-oriented track: I’m pretty sure the last time I taught a full course load was in my late 40s.

3. You will never realize just how much you know about human psychology and organizational behavior—any humans, any organizations—until you start working with computer scientists. To say nothing of research design.

4. The fourth common component is visualization, though I don’t really have skills in that area and thus far they haven’t been required. Though there is also something to be said for the definition cited by Therriault: a data scientist is someone who can’t write software as well as a software developer or do statistics as well as a statistician, but nonetheless can do both.

5. Or, let us be realistic, with the current demand for people who call themselves “data scientists”, even if you don’t…

6. If your core academic competency is writing jargon-laden critiques of why quantitative models cannot possibly work, you will probably need to continue grading blue books indefinitely, and you might want to just stop reading at this point lest you become very, very sad. No, wait, such people don’t get sad, just outraged. Which, of course, is the same thing.

7. So stay away from the start-up porn, or at least don’t take it seriously: Beyond the fact that the vast majority of startups fail, even in the ones that succeed most of the employees of startups don’t get the benefits, and can be left with burdensome tax obligations after those highly valued unicorny—or is it unicornish?—stock options decline in price. But again, that’s someone else’s essay. Just be cautious, okay?: there’s a critical difference between getting out of the box and going out of your mind.

8. And discuss it with your partner, if such an entity is in the picture. Avoid those “Honey, I’ve decided to go into business for myself!” “Oh, so you’ve been fired?” situations. Though my wife was amused to watch the reaction of people when they said “Your husband retired” and she corrected “No, he quit.” Invariably, people—granted, this was in a company town—responded with variations on “But no one quits a tenured job!” Yes, they do.

9. As for non-tenure-track positions in academia: hmmm, are those in the fifth or sixth circles of Hell? <TRIGGER WARNING!!!> To paraphrase Denis Diderot—hey, man, wasn’t he a midfielder on Cameroon’s World Cup team in 1990?—academia will be free when the last associate dean is strangled with the entrails of the last journal publisher, dumped in the ruins of the last student aquatic center and buried by a press-gang of the last post-modernists </TRIGGER WARNING!!!> I digress.

10. If you are a 20-something reading this, you probably don’t need to plan quite that far ahead.

11. As well as dealing with payment delays that can run into months. Though as we’ve entered the era of zero to negative interest rates, I’m finding my invoices are being filled noticeably more quickly. Funny, that.

12. Though, after tenure, without much required in the way of a work ethic.

13. …subsidizing loss-making athletic programs; paying for the insurance, liability claims and golden parachute payouts of disgraced administrators of profit-making athletic programs…I digress…

14. A speaker I recently heard who heads a Charlottesville-based engineering startup quite possibly heading for unicorn territory put the issue succinctly: “Silicon Valley has the best tech ecosystem in the entire world. But unless you are in the top 1% of income, it is a shitty place to live.”

14a. I didn’t actually appreciate Charlottesville’s software situation until I attended one of my first data science meet-ups and casually mentioned to some stranger that Parus Analytics might be hiring a Python programmer at some point. The gentleman, exercising those social skills for which programmers are so famous, gave me a look of disgust and said “Well, good luck with that: no one around here can hire Python programmers because there’s too much demand for them.” Well…I suppose that’s an issue if you are trying to hire a Python programmer, but quite a different situation if you are already a Python programmer.

15. If this is your planned career strategy, taking a few pointers from the Federal witness protection program probably wouldn’t hurt either.

16. We recently returned for an extended vacation in the Bay Area, where my wife had worked for about 15 years in the 1980s and 1990s. Our conclusion: the restaurants in Charlottesville are better. And saving $2000 a month in housing costs buys a lot of restaurant meals.

17. For example the regulation of ginseng dealers. Because as political theorists from Montesquieu through Weber to Skocpol have emphasized, one of the core functions of the modern state is to protect the citizenry from the threat of substandard ginseng.

18. Years ago I saw an advertisement in some airport with a picture of the stereotypical evil besuited banker, cigar in hand, saying “You don’t need a small business loan, you need a job!” Yep, that’s the attitude. I’ve discussed this issue in more detail here: Apply for a mortgage as a sole proprietor and, whatever your assets, you will be politely but firmly told to go to hell. Though this will be blamed on Obama, Clinton, Warren and the Illuminati. Nonetheless, I own a house. With deer.

19. To whom I owe at least half of the ideas in this essay.

20. The building where my office is now located also houses a web developer, at least three psychological therapists, one Rolfer, one hot-stone massage therapist, a small trade journal, and a couple lawyers. Adjacent properties—all converted residences—have a remarkable number of hedge fund managers, a Sotheby’s office catering to those hedge fund managers, still more lawyers (we’re near the county courthouse), and a hospice.

21. Based on having now rented four spaces, two of which I was very happy with and two which didn’t work out, here are the criteria I use:

  1. About 200 square feet with walls and a door: I like to sprawl.
  2. Not too many people around, but not too few
  3. Kitchen (but I don’t use conference rooms nor teleconferencing facilties)
  4. High-speed internet
  5. Windows!
  6. Walking distance from home and walking distance to a coffee shop (or in the case of the Charlottesville pedestrian mall, six coffee shops)
  7. Weekend parking 
  8. When furnishing your new digs, Habitat for Humanity ReStore outlets and Goodwill are your friends: you can get amazing stuff there. It’s all zero-sum on your money now. Put Lowe’s in the mix as well.

22. Did I mention that you should read Paul Graham?

23. Roughly the size of a modern infantry squad. And a Roman infantry squad. And a Mongol cavalry squad. And a baboon foraging party: we’re hard-wired for this number.

24. I’m guessing small groups also do a lot of income equalization but they are able to much more effectively monitor and sanction slackers.

25. The median salary of a programmer in Charlottesville is reportedly $76K—presumably not including benefits—so adding those benefits and even minimal administrative overhead, $125K to $150K annual revenue is probably more realistic. So at the self-managing baboon troop level a project would probably need a minimum of around $1.5M in annual revenue. An organization at the baboon level should also be able to get by with an overhead rate of 10% to 15%, substantially less than that of larger organizations, and this will sometimes, though not always, provide a competitive advantage.

26. This actually gets to what I regard as one of the two most important economic  (or political-economy) questions of our time (the other being, of course, the rise in income inequality): what sort of long-term balance will be achieved between the incredible efficiency of baboon-troop-sized small enterprises and the oligopolies of contemporary “bozo capitalism.”  The disintermediation made possible by new technologies is reversing the classical Coase transaction cost justification for the corporation, but at the same time corporate power is being concentrated at levels not seen in more than a century. Where does it all end?

27. You will also find that you shift from placing a premium on the multi-tasking required in academia—and pretty much any large organization—where you rarely have large blocks of focused time, to highly focused uni-tasking where your objective is to be at the production-possibility frontier of quality vs time. Observed from the perspective of a small group where your income is entirely dependent on what you can produce, the amount of time wasted in the endless pointless meetings characteristic of large organizations is jaw-dropping.

28. Or, god forbid, reading proprietary social science journals: Trying to do research reading the published social science literature is like trying to drive by looking through the rear window using a telescope. For example, an article I had coauthored was recently linked from an interview in the Washington Post. Which was really cool, except that the original idea had been drafted—I remember this very clearly since it was during PolMeth XXIII at UC-Davis—a full ten years earlier.

29. There is nice recent discussion in Science on the huge advantage computer scientists have gained by emphasizing open-access pre-prints (and arXiv specifically) over proprietary journals. Science has the sense to leave this open-access rather than pay-walled. At least at the moment. Thus depriving us of an opportunity for deeply ironic snark.

Posted in Uncategorized | Leave a comment

7 reasons political science “math camp” is a complete waste of your time

This little rant is going to piss off a lot of people in my professional circles but, well, I’m known for doing that sort of thing.

So, today, I announce my support for Donald Trump for president.

KIDDING!!! Though what follows will probably be just about as popular.

I just started tweeting at the beginning of this year, and I’m still trying to figure out—along with, I’m guessing, no small part of those in the twittersphere who both blog and tweet but don’t make a living from either [1]—when to blog and when to tweet. Though I sense that there is some threshold, probably around three or four, where multiple tweets probably mean I should be blogging instead.

And such was the situation this morning where I went on a tweet-rant against the political science “math camp” concept, albeit in response to an innocent posting on the utility of computer programming from John Beieler and in no small part from being stuck at a garage for an extended period of time while it was ascertained whether the tires on my Vespa would pass state inspection. [2]  But in fact, I’d actually had a note to write an essay on this topic back in August—now that would have really been cruel, eh?—so it’s not like this hadn’t occurred to me before.

So I’ll write it now, when it will presumably be forgotten by August, though not before some subset of you, dear readers, decide among competing graduate school offers. Cue maniacal laughter, “Fools, I will destroy them all!…”

I digress.

But first, two clarifications. By “math camp” I mean the hazing exercises conducted on the part of an increasing number of political science graduate programs which, in one or two weeks, purport to impart to students, many of whom have probably had little classroom instruction taught in mathematics departments since their first year in college or even AP courses in high school, a crash course in “mathematics” which, if the material I’ve seen on syllabi and texts is any indication, goes through about the level of the first year of a graduate degree in mathematics. Having completed both an undergraduate and master’s degree in mathematics I am, to put it mildly, skeptical.

Not to be totally, negative, I’ll actually just give three reasons why “math camp” is a terrible idea, and then four reasons why the time would be better spent on basic computer programming, the gist of the inspirational tweet of @johnb30. Who bears no responsibility for the rant to follow and for all I know, loved “math camp.” Though somehow I doubt this.

1. The basic “math camp” concept is ludicrous

My mathematics education—effectively in applied mathematics and mathematical statistics—involved a total of about 50 semester credit hours, half at the undergraduate level and half graduate.  That’s roughly 750 classroom hours, and once beyond the introductory level, probably at least two hours of homework (often more) for every hour in the classroom, so let’s round the total to 2000 hours, and say fully half of it wouldn’t be considered part of a “math camp” curriculum—very conservative estimate, based on what I’ve seen— so we’re left with material that experienced instructors using a curriculum that has been refined over the past three centuries believe requires at least 1,000 hours to master.

Political science programs claim to be able to teach this same material in 40 to 80 hours. Yeah, right.

2. Even if you know the basics, you can’t learn the rest of mathematics on your own because it is a complex culture

No, I’m not going post-modern on you here, since that culture is subject to a highly constrained set of rules, not “Wow, that feels good, hand me another of those candies you brought back from Denver, and wow, isn’t Derrida sooooo like cool!!!” But mathematics is an intricately linked set of rules, idioms and norms which one slowly learns through a progressive sequence of purely mental exercises that has been refined over, literally, centuries. Mathematics can certainly be taught badly—alas in the secondary education system in the US, that’s virtually the only thing one encounters [3]—but teaching it properly is a very gradual process requiring constant feedback, attention, inculturation into professional norms and hundred of hours of intensive practice involving oftentimes intense concentration which must also be learned. After the basic level, this is done almost entirely through the mastery of mathematical proofs, which while dependent on algebra, are largely extended exercises in formal logic, the sorts of things where—yes, this actually happens, regularly—you spend hours staring at something and running around in mental circles until, finally, the step forward is completely obvious.

3. Most of what passes for “mathematics” in political science is just very bad algebraic notation.

The first couple of years as a naive assistant professor, I actually tried to write articles in a mathematical style for political science journals. Thanks to a very accommodating committee, I’d gotten away with that in my dissertation [4] but it went nowhere in the journals. The grounds for rejection was real subtle: one review from a four-letter journal literally just said “Too much math, no one will understand it.”  So I switched first to statistical analysis (plus some field work) and eventually to mostly doing software and data development, and did okay.

Though, I suppose “no one will understand it” was an accurate appraisal, and perhaps the reviewer was doing the four-letter journal, and maybe even me, a favor. The apparent exception to this rule were the algebraic “proofs” of the “rational choice” school, which I never really warmed to because it looked like really bad social science masquerading as even worse algebra—sort of the formal equivalent of post modernism, which is really bad social science masquerading as even worse exposition—and in subsequent years rational choice has been thoroughly dispatched back to the netherworld by the likes of Kahneman, Tversky, Thalin and now a generation or two of skilled behavioral economists. The four-letter journals would publish long proofs that were nothing more than convoluted algebraic identities, tied together with the cookbook invocation—sort of an extended shamanistic ritual, minus (I presume) the animal sacrifices, though one occasionally wondered—of a couple complex theorems the author almost certainly could not even begin to prove on their own, and I suppose one can still get away with some of that. [5]

[Political science involvement in statistics, on the other hand, has taken a very different route in the decades after political methodologists, initially led by Chris Achen and John Jackson, set up their own organization and established journals that could enforce a high level of standards without penalizing authors for complexity. Full development of this took a couple of decades but it has now reached a point where some of the methodological developments which either originated in or saw extensive practical development by political scientists are at the cutting edge of applied statistical work. A possible downside of this has been that individuals trained to state-of-the-art political science methods can oftentimes find for more attractive employment outside of political science, either in more methodologically-friendly academic departments or in industry. Meanwhile in mainstream political science, “Too much math, no one will understand it” lives on.]

So, from the perspective of actually learning any mathematics, you are completely wasting your time in “math camp.” That said, you will presumably get up to speed with some remedial algebra and learn a bit of new notation [11], though I cannot understand where anyone got the idea that these are more effectively conveyed in isolation than in the context where they are actually used. And, of course, as is the nature of hazing exercises, you will share the first of many, many WTF moments with a group of strangers who over the next decade will almost certainly become some of the most important people in your life, and perhaps this is all that “math camp” is really supposed to accomplish.

However, if you are in a quantitatively-oriented political science program [6] what you should be doing, per @johnb30, is learning more computer programming. For at least four reasons:

4. Contemporary quantitative political science is data science

And data science is now recognized (alas, I forget who first came out with this formulation) as pretty much equal amounts of statistics, machine learning, data wrangling via some toolkit of general purpose programming languages, and data visualization. You need all four, though probably not quite equally: I’d go lighter on the visualization and make sure your statistics training is both frequentist and Bayesian (and to the extent you can get away with it, your statistical practice is mostly Bayesian).

5. The journal referees won’t penalize you for the complexity of computational methods

Or at least there are now a set of journals with the high impact ratings that will get you jobs, tenure, promotions, grants and happy deans and deanlets that will not penalize you. Nowadays complex material can not only be put on the web, but due to replication standards, it will probably be required to be on the web. But as long as your code does what you say it does, so you are unlikely to be penalized for the fact that your work is complicated. Or involves math. The emergence of R and Python as open source data processing lingua franca also has helped a lot.

6. Once you’ve got the basics, you can—and will—learn more programming on your own.

This is a fundamental difference between mathematics and computer programming: mathematics is a highly formal and complex means of communication between mathematicians, whereas programming languages are a highly formal and complex means of communication between humans and machines. But it is a two-way communication: mess up a program syntactically and the machine will let you know, or (well, when the data fairy is being uncharacteristically kind) this will be evident in the results. Furthermore, unless the pace of development slows dramatically, in ten years, or certainly twenty years, you will be doing most of your formal work in a completely different system than you are using today, and you will have learned those new skills outside of any formal educational context. [7] You will be able to do this easily because of the vast and ever-evolving array of open resources available on the Web: work regularly with the Web as, effectively, your assistant and technical go-fer sufficiently long and you almost start to believe in this “singularity” stuff at least in some sense.  

Programming is learned by writing programs, reading code, and, critically, re-writing (“re-factoring”) your own code as you become more skillful. This is a life-long process. Or should be. 

7. It is worth going beyond the basics

Anyone who has programmed an Excel spreadsheet has done, well, programming at a basic level. I’ve seen political science graduate students with little or no formal programming coursework developing scripts in R or Stata at very high levels of complexity, albeit needlessly high because they could have done the tasks a lot more easily in perl or Python. Web programming used to be sort of a trailer-trash backwater, suitable for the likes of UFO-worshipping suicide cults, but after a couple of decades has evolved to high levels of sophistication that require knowledge of some underlying theoretical concepts to use effectively.

The problem with just focusing on self-taught (and peer-taught) programming is it is easy to by-pass (or only partially learn) some important concepts that go well beyond little rules of thumb like making sure every line ends with a semi-colon and absolutely never write something where white space is syntactically important.[8] First and foremost, data structures beyond arrays, and object-oriented programming concepts.  These do need to be learned, IMHO, because you can program without them—for the first couple of decades of computer programming, the entire community did—but you can be far more efficient if you know how to use them.[9] At the secondary level, learn correct idiomatic programming in your languages of choice, both because script-based systems like R and Python are implemented with common idioms in mind (that is, idiomatic code will be better optimized), and you need to know idioms to read code, and as almost everything you will use will be open source, you will read a lot of code.

So Phil, this sounds great—can’t you make it just a little less snarky [10] and put it in the form of a departmental memo to the graduate curriculum committee…oh, too late for that…well, why didn’t you put it in the form of a memo and try to get it implemented at one of those graduate programs that tolerated your lack of faith and notoriously bad attitude for decades?

Been there, did that: tried to get something like this adopted for a good quarter century to no avail, and finally just slunk out into the sunset. Or something.

Leaving you poor bastards to face, alas, “math camp.”


1. If you do make a living from blogging, like Dan Drezner, Ross Douthat or Ezra Klein, the synergy with Twitter is obvious and effective.

2. They didn’t: of the myriad ways one could discover that scooter tires need replacing, an annual inspection is probably the least painful and expensive. Despite being another one of those horrible ways that damn gov-mit intrudes on our private lives! Horrible, horrible. Running over a pedestrian or pancaking into a semi would have accomplished the same thing without all that useless bureaucracy. Damn gov-mit… I digress…

3. I got very lucky to have a junior high school math teacher who pushed beyond this, though I think he only lasted in the system a few years before departing to work in the financial sector, which I somehow suspect paid better than teaching in southern Indiana. The mathematics department at Indiana University, where I did both my undergraduate and graduate work, took pedagogy very seriously, and I also had several very skilled teachers, including Daniel Maki and Maynard Thompson, pioneers in teaching about mathematical models of social processes, Pesi R. Mansani, a student of Norbert Weiner who taught a decidedly rigorous year-long probability theory course, and during a one-year visiting gig after he retired from Berkeley, the inimitable statistician Henry Scheffé who, like George E.P. Box, emphasized that in statistical analysis, it is foolish to be concerned about mice when there are tigers about.

4. And kiddos, what was worse than running programs from punched cards?: typing publication-quality equations before LaTeX. Damn whippersnappers don’t know how good they’ve got it…get off my lawn…

5. This is probably an urban legend but one of the mathematicians at Northwestern who helped develop NU’s Mathematical Methods in the Social Sciences program—which when I was involved took around nine quarter-length courses to cover what a typical “math camp” tries to do in two weeks, and this for a highly selective cohort of students—was said to have been sitting watching some rational choice dude, probably a hapless job candidate, going through a “proof” that filled three blackboards (in the era when actual blackboards still existed and would cover three or four walls of mathematics classrooms).  When the “proof” was finished, he got up, erased all but the first two and last two lines, wrote two more lines in the middle, and said “That’s all you need for this proof.” Though a in later era, of course, he would have said “That’s not a proof, this is a proof.”

6. If you aren’t going for quantitative training, just spend lots and lots of time in the field, whatever “the field” corresponds to in your subject domain. Get yourself somewhere people really wish you weren’t—I think that advice can apply to pretty much everything worth studying qualitatively in political science—though please don’t get yourself killed, which could happen in places like Egypt and sooner or later probably Trump rallies. But also minimize your time in seminar rooms: they are toxic.

7. In the roughly fifty years I’ve been programming, I’ve transitioned through five primary languages: FORTRAN, Pascal, C/C++, perl and now Python. Plus a cluster of secondary languages like assembler, Algol, SNOBOL, Java and now javascript. The transitions in statistical packages were a bit slower: SPSS to Stata to R to a still evolving suite of Python tools. By contrast, your high school geometry textbook was basically written by “Euclid” around 2300 years ago; the first-year college calculus curriculum has changed little in 200 years.

8. Python joke[s]…we’re just bundles of laughs, programmers…

9. An instructive example: the TABARI automated coder, written largely in C, was exceedingly fast for its time (ca. 2000) because it worked quite close to the machine level, largely doing its own memory management and working almost entirely with nested pointers. When I wrote the initial version of the PETRARCH coder as the successor to TABARI in Python, which is at a higher level of abstraction, the program was quite slow. Then last summer Clayton Norris, a computer science and linguistics student working as a summer intern at Caerus Associates, re-wrote the core of PETRARCH using contemporary data structures appropriate to both Python and the Treebank parsed input it uses, and increased the speed of the program by about a factor of ten.   

10. Moi?…dream on…

11. But as I tweeted, notation alone doesn’t get you very far: knowing the common notation from mathematical statistics is like saying you’ve learned Arabic because you know the alphabet, can recognize a few common words on shop signs and restaurant menus, and will toss in a few instances of “insh’allah” (“p-value”) and “yani” (“significant”) into your conversation.

Posted in Higher Education, Methodology | 3 Comments

Seven reflections on Trump, Sanders and the crisis of bozo capitalism

pdf_iconFreedom is never more than one generation away from extinction. We didn’t pass it to our children in the bloodstream. It must be fought for, protected, and handed on for them to do the same.
Ronald Reagan

Blogs are funny things. “Seven Deadly Sinsstarted as discussant notes on a perfect storm of a bad ISA paper, and the hapless presenter actually enjoyed the rant because a senior scholar was giving the work such close attention. “Going Feral began as a two-page cri de coeur in my obligatory annual report at Penn State. The two people who should have paid attention to it didn’t, so it gets 27,000 views instead. Go figure.

And this essay started as a 3600-word rant against the inanities of United Airlines discovered during my efforts to get home from Europe following Storm Jonas.[1] But in the process of thrashing about trying to focus that essay, I realized I was onto something much bigger: the problem with United is not merely that they are a greedy incompetent oligopolist, but that greedy incompetent oligopolists dominate the Old Economy generally, and in 2016 this is having profound political effects. Our problem is not capitalism, but bozo capitalism.

So off we go… [2]

Bozo capitalism—you heard it here first!—results from the convergence of six late 20th century phenomenon which have combined to produce a system where the management of large sectors of the economy is under the control of clueless dolts even while other parts of the economy are thriving. The causal chain occurred as follows:

  1. As Adam Smith argued extensively in the 18th century and Mancur Olson argued in the 20th, one of the inherent instabilities of market economies are the outsized rewards which come from using political power to restrict competition. Free market systems are not innately stable but need to be maintained…
  1. An idea lost on the followers of Rand and Reagan by the late 20th century, who happily let the system run amok in the naive belief, owing far more to Jean-Jacques Rousseau than to Smith or Burke (or Hayek), that any political interference with markets led to sub-optimal outcomes. Beliefs that were probably actively encouraged by…
  1. The increasing resources and dominance of finance capital, fueled by the [inter-related] combination of increasing concentration of wealth and the proliferation of computer technology allowing for the development of mind-booglingly complex financial instruments whose [supposed] critical characteristic was removing virtually all links between the returns on the instrument and the performance (and ownership) of the underlying assets. While this had a number of effects—notably financial meltdowns such as the various Third World debt crises of the 1980s, the savings-and-loan crisis of the 1980s and 1990s, some elements of the internet bubble of the late 1990s, and eventually the Great Recession—it encouraged a series of highly lucrative mergers and acquisitions which accelerated the conversion of many sectors of the economy from at least vaguely competitive market systems to thoroughly entrenched oligopolies. While at the same time, generally on the opposite coast of the USA from those financial centers…
  1. The twin technological revolutions of the personal computer and the internet fueled the rise of an entirely new economic ecosystem driven by entrepreneurship which, probably starting about 1990, attracted far and away the best and the brightest of at least two generations to the prospect of creating exciting new companies outside the dull but increasingly politically privileged oligopolies of the old order. A few of these companies succeeded and became huge in their own right—in the 21st century, Amazon, Apple, Facebook, Microsoft and Google—and many more succeeded to the point where they could be purchased (either for their inventions, or—you guessed it—to simply eliminate competition) by the apex predators. But, let us be realistic, most of these start-ups failed, but the individuals who had been involved in them…
  1. Did not go to work for the oligopolies—once one has experienced the freedom of a start-up, becoming a corporate drone has all the attraction of dining on a steaming plate of dog poop mixed with broken glass—but instead embarked on the various routes—again, enabled by the diffusion of computing power and instant communications—to a very comfortable and satisfying life outside of the corporate oligopolies and financial sector (and, for that matter, other large institutions such as, ahem, academia…). Which meant that those institutions were left with…
  1. The losers and psychopaths who were sufficiently privileged from birth and elite (typically legacy) education to function in very large established organizations so long as those were politically protected from the market, but not clever enough to make it in a market-driven new technology start-up. The contemporary poster child would, of course, be Martin Shkreli, but we see this in Jeffery Skilling of Enron, John Sculley at Apple, James Cayne of Bear Sterns [3] and thousands of others. You know the general type: the guys in Ted Cruz’s fraternity who were in charge of waterboarding pledges and procuring cocaine and date-rape drugs and were tolerated mostly because their Daddy’s millions helped pay for the rent and the legal fees. And the cocaine.

In a nutshell, we’ve got way too many Old Economy executives who think they are John Galt or Steve Jobs when in fact they are Charles Montgomery Burns.  


This, of course, has been the New Economy view of the Old Economy from the beginning, though their ire has far too frequently been diverted into a naive libertarianism which blames all ills on the government. A viewpoint encouraged by vast amounts of funding invested in “Look, a squirrel!” efforts by new right-of-center “think tanks.” [4] Granted, government can certainly be a problem [17] but government discretionary spending is less than 8% of the economy and has generally been declining during the period that the New Economy has developed.[19] So government screw-ups, while doubtlessly irritating, necessarily pale in comparison to the influence of the bozos on the commanding heights of the Old Economy, and while many of those in the New Economy have the luxury of indulging themselves in the fantasy world of Rousseauian libertarianism, there is a very significant segment of the population which instead…

  1. Has to cope with a largely dysfunctional system which is not only beyond their control economically, due to stagnating incomes and exponentially increasing inequality, but also politically, thanks to the likes of Tom “Pay-to-play” DeLay and the institutionalization of K-Street corruption culminating, of course, in Citizens United. But as important, the supposedly liberal champions of the masses, the heirs to the Roosevelt/Johnson coalition, think nothing of accepting five-times the median family income to give a single speech to Goldman Sachs. Yes, earning more in a couple hours than the average family would earn in five years. And thinking absolutely nothing of it.

For these people,and there are a very large number of them—damn! democracy! damn, damn, damn, damn! —the system is not merely inaccessible, but incompetent on a day-to-day basis because those in charge simply don’t have the wherewithal behind their foreheads to make it otherwise. [5]

So we experience the entirely predictable sub-prime mortgage collapse—wow, maybe someone should make a movie about that! And then there is your local cable company: You just love your cable company, right, and every month you get a bill that includes fees for the Snake Channel and the Hitler Channel when frankly, you’ve seen enough of both snakes and Hitler, but you thank your lucky stars that you live in America where the local cable franchise is a protected monopoly, and are even more thankful that your service has a fraction of the speed, and multiple of the cost, of what you’d have in Europe or South Korea because, well, if it was faster you’d just have to see more videos of snakes and Hitler.

And United Airlines. But that’s deserves an entire entry itself.

Thus explaining contemporary landscape of American politics in a single sentence “The system is rigged.”  [6] Not just rigged, but rigged to favor and entrench the incompetent. That, ye of the pundit class, who do finally seem to be “getting it”, is what is driving voters to support Trump and Sanders instead of the establishment.

Quod erat demonstrandum

Where do we go from here?

Let me start by noting that not everything in the Old Economy is done incompetently. True bozo capitalism actually requires considerable economic, political and social effort: you must achieve an oligopolistic position, secure it through the purchase of political favors, and then develop a corporate culture that will drive out anyone who might know what they are doing. All that takes a lot of time, and many firms have chosen not to follow this path: Truly, not every company and corporate executive has what it takes to be a bozo.  And there’s the inconvenient fact that if a firm is truly competent in the market, it has little or no incentive to purchase political protection.

But enterprises who do embark on the Path of the Bozo are nearly impossible to avoid unless you have a lot of money. Like the sort of money the political establishments in both parties have: business class or corporate jets, life in gated communities, accountants, and concierge services. For everyone else, it is day after day of small unavoidable insults—the airline that won’t let you change a ticket because of illness or when a relative has died, the insurance company that loses your payment even as it has already been charged to your credit card [7], the endless sessions with some call center in Bangladesh that end with you on hold for fifteen minutes, then with no warning, “Click…” That’s life for most people outside of the upper political and economic strata.

I outlined in a previous essay a strategy on how the Democrats, by taking the reasonable complaints of the Trump voters seriously—and there are many such complaints (and voters)—could lock up both the presidency and the Senate with ease. They could even drive the final nail into the coffin of the Republican Party except that the Republicans have been so busy at this task that it would be hard to find room for another nail.

But that isn’t happening, and I’d postulate it won’t happen because the Democratic establishment is every bit as beholden to the bozo capitalist class as the GOP. And that won’t change: the wealth of the Democratic donor class is particularly dependent on exploiting the lower middle class. Oh, and have you noticed the folks down at the Elks Club aren’t offering $225,000 for a speech?

So where we are at the moment? Going right to left on the political spectrum:

If he can keep his campaign staff out of jail, Mr. Popularity will pick up the not-insignificant social conservative block and those parts of the Tea Party economic conservatives who cannot support Trump’s positions on the welfare state.

All of these offer the third term of George W. Bush, meaning ballooning deficits due to tax cuts for the wealthy, ill-conceived and unfunded wars concocted by wealthy establishment chickenhawks but with the lower middle class doing the fighting and dying, and in the end yet another bubble collapsing with more bailouts going to the bozo capitalists. To the utter horror and surprise of the GOP consultant class, this prospect just isn’t catching on.

The Donald [8] has the advantage that he can go straight to the core issues of his now clearly defined constituency without the constraints of an ideology: his statement in support of Planned Parenthood was brilliant.[9] Insisting on defining the Pope’s job: well, probably less so.[10] Trump continues to triangulate by the day but will almost certainly converge to a welfare state populism which is simply a U.S. variant of contemporary wide-spread and increasingly successful European right-wing populism.

Clinton appropriately promises an Obama-3 administration with tantalizing prospects of the peace and prosperity of the Bill Clinton-3 administration, which is certainly more attractive than Bush-3. Granted, it leaves the financial class and bozo capitalists firmly in control, but since the Clinton and Obama terms saw steady, if gradual, improvements for minorities—look at the data in Case and Deaton—particularly those outside the cohort (apparently now extending down to the age of 12) subject to random extrajudicial executions by police and white vigilantes, there are plenty of votes to be found from that position. But this prospect offers very little to younger voters, and is literally a death sentence for some in the cohort of lower middle class whites identified by Case and Deaton.

He looks, walks and quacks like a democratic socialist, so I’m granting that he is a democratic socialist. And consequently very attractive to the young, who basically have nothing to gain under the current system, who are totally repelled by the racism and misogyny of Trump [11], and who are sufficiently cosmopolitan—either individually or through their social networks—to know that life in the lower quintiles of a European socialist democracy can be pretty darn good. Sander’s problem is that he is trying to be Roosevelt—both of them—in the 21st century and has yet to formulate an economic plan that is even remotely coherent: “cookies, kittens and bunnies, for everyone!” doesn’t cut it.

Add to this mix the likelihood we will be looking at a three—or conceivably even four—person race, with Bloomberg entering if Trump is the GOP candidate (and certainly if Trump is combined with Sanders on the Democratic side), and Trump (and possibly Bloomberg) as an independent candidate if he is not nominated, unless something goes humiliatingly bad for him in the “SEC primaries” of 1 March and he is legitimately eliminated from the GOP race.

And here at mouseCorp? Well, I’d be really happy if some adult supervision, starting with economics and foreign policy, came to Team Sanders. Certainly not too late for that.[12]

But here’s what I really want…

Klobuchar-Castro 2016!

Yeah, 2016, not 2020 or 2024. Because if Trump is elected, there may not be an election in 2020.[13]

We could well be looking at brokered conventions in both parties, so anything is possible.

At least think about it, eh?

Beyond the Snark

Mancur Olson:

Adam Smith, J-J Rousseau, etc.: if you need these references, none of the rest of this essay will be making any sense. Though please note that when I’m associating a position with Rousseau, that’s not meant as a compliment.

In capitalist economies, political institutions still matter:
Baumol, Litan and Schramm:
Acemoglu and Robinson:

The system is rigged:

Earlier rant on how the system is rigged from the perspective of a [really] small business:

Worth looking at: Or just read pretty much any of Ross Douthat’s recent columns.

Yet another op-ed—this time from that cesspool of the far left, the University of Chicago Booth School of Business—on the anti-market proclivities of the GOP:

Sanders’s economic policy needs some work:

Sanders’s foreign policy needs some work:

Millennials have no interest in joining Old Economy corporations:

And some examples of what they are doing instead (along with an extended paean to general aviation):

Another article on life in the hellhole of Nordic democratic socialism (by one of my former Fulbright colleagues!):

Anne Case and Angus Deaton: For the original:

United Airlines, we hates them forever!:–slowly/2016/01/21/a3ce3478-bb07-11e5-829c-26ffb874a18d_story.html

The Koch think tank network:

What happens to people who expose the Koch think tank network:


1. Further provoked by the unsurprising revelation in the Washington Post that making life miserable for its customers has actually been a core business strategy for United.

Granted, having once been stuck in Khartoum not knowing when I’d get out, being stuck in a major European city not knowing when I’d get out wasn’t a terrible hardship. But in a variety of ways, United—the corporation, not the line employees, who apparently fully realize they are working for a bunch of incompetent losers, and periodically engage in job actions at varying levels of subtlety to emphasize this point—did not handle this particularly well. And how come every time I encounter a United employee doing something nice, or even sensible, they are muttering “I’ll probably get in trouble for this…” These experiences, by the way, after paying an amount equivalent to the cost of a really, really nice Apple laptop for the ticket.

Though United: you’re not off the hook yet. Put me on a long flight following an unsatisfactory corporate experience and…blogs happen. Apple was the previous target.

2. Still waiting for the second half of the advice to DARPA program managers?: haven’t forgotten, just busy. Read Kahneman and Superforecasters, throw out anyone in the room who can’t explain the concepts of endogeneity, selection on the dependent variable and standard error, and you’ll be okay.

3. Cayne is now a mere footnote in the sordid history of the 2008 financial collapse, but he’s the guy who focused on international bridge competitions (the card game, not the physical structures: infrastructure is for, oh, yuck, those people) while the corporate house of cards [sic] he’d built—or rather supervised, sort of, while others built it for him—collapsed.

4. tl;dr alert! There’s another essay in waiting that was originally titled “Seven Things Liberals Can Learn from Classical Conservative Thought” but in light of events in the past few weeks, is being retitled  “Seven Things Conservatives Can Learn from Classical Conservative Thought,” as reasoned conservatism has all but disappeared from political discourse, replaced by a combination of bombast, paranoia and, well, liberalism.

Much of the blame for this lies with Fox News and talk radio, which never met a political fantasy so implausible or loathsome as to not attract an elderly white audience. But I think the conservative think tanks have failed miserably as well, particularly given the number of problems which have emerged which had been thoroughly anticipated by the likes of Smith, Mill, Schumpeter and Hayek. Instead, all we get now is some sort of vaguely Randian libertarianism devoid of social conscience, contract or history.

I’m beginning to wonder if, ironically, this lapse was an indulgence—which is finally instantiating the predictions of Karl Marx for godsakes!—made possible by the decline of the existential threat of Communism by the early 1980s [14], which allowed the conservatives in the liberal democracies to let down their guard on the End of History assumption that what remained was stable and unchallengeable. Yeah, End of History: how’s that working out for you?

But whatever the cause, the skepticism of classical conservatism has given way to a giddy combination of “what, me worry?/I got mine, Jack” libertarianism [15], largely on the West Coast and a few other New Economy enclaves, and a nearly clinical paranoid millenarianism pretty much everywhere else. Leaving this observer to wonder if many of these “think tanks” exist mostly to fill the role similar to that of the President of the Galaxy in Douglas Adams’s Hitchhiker’s Guide to the Galaxy: “The role of the office was not to exercise power, for it had none, but rather to distract attention away from where power was actually being exercised.” [16]

Well, more to follow. But meanwhile perhaps one should pay a little more attention to Epictetus, Machiavelli [18], Burke, Smith, Madison, Hayek and Buckley and a bit less to Limbaugh, Beck, Coulter, Norquist and Gingrich. Not holding my breath.

5. Oh, United Airlines [humming] “can’t get you off of my mind…” so I’m sitting over there in snow-imposed if not unduly unpleasant exile trying—quite unsuccessfully—to get anything coherent out of United’s 1980’s era computer system, and I’m thinking “Why is it the case that Google and Amazon can track my every whim—and those of a billion or so people throughout the industrialized world—even when I don’t want them to, and United can’t do so for a few tens of thousands of stranded customers even though we desperately want to be tracked!” Bozos.

In the midst of this, following a Skype conversation while wearing some distinctive glasses, my wife started getting ads for similar glasses on Google sites. Coincidence?—that’s what they want you to think.

Another case in point:The astonishingly successful Amazon Web Services was developed pretty much at the same time and using pretty much the same technology as the original disastrous public/private partnership known as Bozos.

6. Following Greg Sargent’s original exposition, “The system is rigged” is the focus of a jaw-dropping op-ed by no less than Charles Koch. Though for Koch, whose efforts have sucked the oxygen out of legitimately conservative political thought for the past quarter century, to complain about the current state of affairs shows the audacity of a man on trial for murdering his parents appealing to the court for mercy because he is an orphan.

To their credit—a phrase you probably did not expect to see in the same sentence as “Koch”—at least the Kochs use their own money, not that of shareholders. Perhaps a backhanded compliment to the culture of Kansas, the state they’ve done so much to destroy.

7. Welcome to our household’s recent experience…

8. Who has at least succeeded at some things and not always by manipulating the political system, and consequently is not a pure bozo capitalist. Furthermore a great deal of his appeal comes from his widely asserted contention—as with all things Trump, it’s hard to say whether the word “fact” would be appropriate here—that he buys politicians rather than being a politician subservient to the likes of himself.

By the way, Steve Inskeep’s recent NYT op-ed identifying Trump with Andrew Jackson is totally on the mark! Much better than the comparisons with Huey Long or George Wallace, who were also populists but had far more political experience. Though Trump only claims he could get away with killing people: Jackson actually did so.

9. Do you think Chelsea Clinton or Jenna Bush ever crossed the threshold of a Planned Parenthood clinic for an appointment? But a great number of Trump’s supporters certainly have.

10. What?!?: those damn popes need to learn their place! And it’s not just Francis: look at these losers:

Pius XI on “social order”:

Leo VIII on conditions of the working class:

Comfort the afflicted and afflict the comfortable, my ass…

11. This went completely under the radar in subsequent commentary on SuperBowl ads, but the intensely multi-ethnic appeal by PayPal—not exactly a lefty loony company and most certainly a quintessential player in the New Economy—should strike fear given the nearly universal nativism of the GOP frontrunners.

12. Biden?—well, if Biden were a movie, would he be “Terminator 3: Rise of the Machines”, “Transformers: Rise of the Fallen”, “Hangover III” or Star Wars: Episode I”? Your pick, but in all cases he’s a really bad sequel. Stop even thinking about it.

13. Joke, since you couldn’t pull that off without support of the military, and if you’ve been on a military base any time in the past 35 years—that is, since the stabilization of the All Volunteer Force—and of course if you are like most Americans, you haven’t—you will instantly see why Trump is not going to be popular with the military. This ain’t the Weimar Republic and on that dimension in particular, Trump could not be further from Hitler.

And by the way, no soldier is going to drag the mother of some guy he shared a foxhole with in Afghanistan off to some godforsaken detention center in the Arizona desert. Not sure Trump and his supporters have quite assimilated that: keep in mind only 5% of the population has even a family connection with the military. And if orders for doing that go out…well, now maybe the 2020 election might be in question. And not for the reasons Trump had in mind.

14. Communism (as distinct from the Cold War military stockpile of nuclear weapons) arguably ceased to be an existential threat sometime in the early 1980s due to the combination of

  1. Western economies recovering from the oil shocks of the 1970s
  2. China adopting a capitalist economic model under Deng Xiaoping
  3. The Solidarity movement in Poland
  4. Gerontocracy in the Soviet Union
  5. The absence of any significant “domino effect” following the 1975 Communist victory in Vietnam. Ironically, that primarily triggered conflicts between Communist states, with China attempting to invade Vietnam and Vietnam invading Cambodia/Kampuchia.

15. That is, the attitude of : Got a problem with United?—Well, all you economy air travelers are miserable little farts who are getting no more or less than what you deserve from the unfettered marketplace, and you should be happy you don’t just get shoved out the exit door mid-flight. As we learned from Ms. Rand, the only deserving people are those who own airlines—well, actually those airlines are all public corporations so these Giants of Industry, these Masters of the Universe, merely manage those companies, and only that if we aren’t terribly picky about the definition of “manage” but details, details…—and anyone who can’t at least afford to lease NetJets, but preferably have unlimited use of a corporate—or personal—executive jet, why such people don’t really deserve to even be considered human. (And by the way, wait until you discover what’s really in that “Soylent Green” we keep in our offices! Economy class scum, that’s what!)  Why the only reason economy class exists at all is to trim the balance of the aircraft and we should probably just be using our many stacks of gold bars for that!

16. That’s not the exact quote, which I’ve not been able to locate on the web, but close enough. But while we are on the topic of the Hitchhiker’s Guide, it also contains a cosmic origin story for our planet not dissimilar to the bozo capitalism hypothesis. Hanc marginis exiguitas non caperet.;dr alert! The highlight of my past two weeks was receiving a survey from the Pennsylvania Department of Economic Development—I haven’t lived in the state for nearly two years—asking what plans I had for improving my business. I gleefully responded “Moving it out of Pennsylvania.” Which allows me to get away from this: 49 separate tax forms that might apply to an LLC, with no practical guidance to the small business owner which is needed. I digress…

At the small business level, the most insidious efforts of state legislatures over the past couple of decades—particularly those who claim to be conservative—have been to create an ever-expanding India-style license raj of requirements for utterly bogus and never-before-regulated “professions” that have been created for the joint benefit of restricting competition—“going medieval” in precisely the manner that guilds held back the development of modern markets for centuries—and lining the pockets of for-profit “schools” which extract tens of thousands of dollars for astonishingly low quality training in tasks that have traditionally been learned at the side of an experienced and successful practitioner. George Will has recently taken on this issue and The Economist has been going at it for the last couple of years, Florida in particular in its crosshairs.

To see just how absurd this has become in the somewhat business-friendly Commonwealth of Virginia, consult this list. Or even better, this, which provides the disaggregated list of 141 generally working-class “professions”—note that this does not include traditional professions such as medicine, education and law—that the Commonwealth feels necessary to regulate. And it doesn’t include my favorite: Virginia’s regulation of ginseng dealers. I cannot begin to tell you how much better I sleep at night knowing that the Commonwealth protects the innocent citizenry from the scourge of unlicensed ginseng dealers.

But what about those guys who took down the 100-foot tree in front of my house last month, a task that improperly done could have crushed my house or my neighbor’s car or any number of cats?: Nah, they don’t need a license. But they sure the heck had insurance! A long-established market solution to occupations which might do harm…wow, imagine that. Maybe that crafty old bird Fred Hayek thought of that solution as well? But thank heavens those guys don’t sell ginseng!

In all likelihood, the reason we don’t have an obligatory for-profit “Acme School of Tree Trimming” is while anyone who can’t actually make a living doing tree trimming can buy a few state politicians—at the rate bozo capitalism is developing, you’ll soon be able to get these on EBay—and get a law passed to require tree trimmers to first accumulate 2000 hours at their school, in this domain they could quickly end up dead or disabled after some hapless demonstration of tree trimming went awry. Whereas failed realtors, unemployable art history majors, or people without the GPA to get into dental school can open obligatory realtor, interior decorator and tooth-whitening schools with little risk to life or limb.

That’s probably the actual story of why tree trimming has escaped licensing requirements. But the story I’d like to imagine is that every time a state legislature considers the regulation of tree trimming, they are visited by a group of large sweaty individuals with extensive tattoos, carrying chain saws and trailing sawdust on the carpet, individuals who think nothing of tossing 150-lb objects around, and make tasteful little jokes about how easily they could snap the forearms of the legislative assistants. In the wake of these visits, the legislators decide that perhaps the more prudent course of action is allowing the market to continue regulating tree trimming. Before returning to the pressing problem of ginseng dealers, who also generally have tattoos but, alas, don’t have quite the same capacity for snapping forearms.

18. On republicanism, not The Prince. The Prince, like J.S. Bach’s Brandenburg Concertos, was just a failed job application.

19. Nor has it been uniformly obstructionist: notice how much credit conservatives have given to the Obama administration’s EPA for their permissive approach to the expansion of hydraulic fracking, a development with stunning global political and economic implications generally favoring the U.S. Yeah, neither have I.

Posted in Politics | 1 Comment

Seven Concepts a Dept. of Defense Program Manager Needs to Successfully Develop Social Science Models: Part 1

pdf_iconThere you go again. Ronald Reagan [1]

It was with a mix of deja vu, amusement and resignation that I saw the latest Dept. of Defense (DoD) pronouncements—try here and here —about their intentions to take a very important innovation in machine learning, recurrent neural networks [2], and use this as the centerpiece of a major new machine-human interaction initiative.  

It’s that word “human” that’s setting me off, as when it comes to technical applications, DoD can’t ever seem to do “human. Wow, creative new initiative but…been there, done that, and I’ve seen so many similar things come and go over the years—decades in fact—always with the same result [3]: a big heap of money spent that might as well have been stuffed into [Chinese] fireworks and sent skyward on the Fourth of July.

It’s probably getting worse for me now that I’m just a couple hours south of the Beltway and can attend meetings on short notice. There’s a pretty consistent script: You start with something ambitious though awkwardly defined—the sort of thing that in academia I would have sent back to a grad student for a re-write—but generally plausible. You’ve got a bunch of people in a room [4], and some of these are absolutely top in their field and are sincerely trying to be helpful and want to get to a feasible project definition, figure out some appropriate technology, and move everything forward. For an hour or two, things are going pretty well.

But invariably, after a promising start, we head down a very predictable rabbit hole and end up—yet again—at the Mad Hatter’s Tea Party. And typically stay there. Curiously, in my experience, this is endemic just to DoD, making it all that more puzzling.

Or not, since the proximate cause of the descent into madness can always be traced to inevitable presence of a cluster of pallid, over-weight men (they’re always white men) of late-middle age—the Pillsbury Doughboy look—representing all of the usual suspects of the permanent civilian defense contractor class. Whenever things start looking promising, these dudes start asking the stupidest of questions, exhibiting unbounded cluelessness concerning the topic at hand, and going into long discourses on the impossibility of doing the sorts of research that other parts of the US government have been doing with great success for decades [5], often on precisely the same topic under discussion, and the likes of Amazon, Google and Facebook have as a gadzillion-dollar business model.

So, methinks, what gives? Are these guys really that stupid and sent by their bosses to get them out of the building? If we were ISIS, of course, these folks would be placed at the top of the roster list for suicide missions, but we’re not ISIS. So why are they here?

With the intensified exposure to this phenomenon in the past couple of years, I’ve finally figured it out: the Doughboys are the equivalent of the Communist era minders in Soviet puppet states, and, consistent with the tactics of the Old Left, their entire purpose is to make sure that these meetings remain completely pointless [6] and avoid the disastrous possibility that DoD might, say, spend $10-million on some social science [7] research that would prevent a $100-million mistake or even worse, spend $100-million on research that would prevent one or more $1-trillion mistakes, or, worst of all, develop a sophisticated social science research culture within DoD comparable to that found in numerous other parts of the government, to say nothing of academia and the private sector. No, discretionary DoD money needs to kept where it has been all along, funding mind-boggling levels of contractor fraud, weapons systems that don’t work , and 12-figure cost over-runs.

Well, gotta give the Doughboys credit, they’ve done one heck of a good job for their corporate masters! But that doesn’t mean we have to like it.

So in the spirit of the Yule season and as a public service, MouseCorp—which, full disclosure, is not entirely uninterested in having DoD learn how to do research appropriate to the 21st century—will provide guidelines to a small number of critical concepts which individuals trying to manage these new programs might, just might, be able to use to shut up these parasitic bastards [8]. There are more than seven—though the list is still fairly small—so for convenience this will be done in two segments, the first focusing on fairly specific technical concepts, the second, in a week or so, on some more general literatures.

For the sake of exposition, let’s assume somehow one or more of these new proposed projects makes it through the preliminary efforts to kill it, and you are managing it, and you quickly figure out that you could get a big boost if you’d incorporate some state-of-the-art social science modeling methods into the project. You’ve got the Doughboys with their corporate overlords and Gucci-clad lobbyists hamstringing you at every step, trying to make sure the project fails, but you’ve got some of that anachronistic Greco-Roman Stoic civic virtue thing going, and you’d really like the project to succeed. And since this is DoD, you’ve got a budget at least an order of magnitude greater than what the National Science Foundation or “the part of the national security community that shall not be named” would have available—granted, fully half of the funding will go to program reviews and PowerPoint slides [9]—so resources are not the issue. Deciding on workable approaches is the issue. So here are some key concepts, with more to follow:

1. The Forecaster’s Quartet

Become familiar with the following:

  • Daniel Kahneman. Thinking Fast and Slow: 30 years of research which won a Nobel Prize and is a great read, long residing on the business best-seller list [10]
  • Nassem Nicholas Taleb. The Black Swan.
  • Philip Tetlock. Expert Political Judgment [11]
  • Nate Silver. The Signal and the Noise: popular-level antidote to the contention that human behavior is not predictable

And finally at the article length and a more challenging technical level, but in terms of political prediction using formal models, easily the most important work in the past quarter century:

Michael D. Ward, Brian D. Greenhill,  Kristin M. Bakke. The perils of policy by p-value: Predicting civil conflicts. Journal of Peace Research 47(4) 363–375 [12]

Like most such paradigm-smashing contributions, they had a very difficult time getting it published.

2. Model specification and the centrality of theory

This comes first in the list of technical terms, since if you don’t get the model specified right, everything else is doomed, and the best weapon in your arsenal here is a thorough review of existing theory. The Doughboys hate that—their motto is “A week in the lab saves an hour in the library” since that attitude keeps the meter ticking and the money flowing. DoD projects [13] tend to approach every problem like they were the very first people to ever think about it, whereas in more cases than you’d expect, someone was thinking about it 2,500 years ago and helpfully wrote those ideas down. If not 2,500 years ago, then almost certainly in the past 50 years. A lot of it is garbage but you will discover that the slop dished out by people utterly unfamiliar with existing theory generally isn’t very helpful either.

A good theory tells you which haystack you need to look in to find the needle. Once you’ve got that, you don’t want to just pile on more hay. Conversely, specify the model incorrectly, and you’re just doing the wrong thing better. [14]

3. Latent dimensions and colinearity

A “latent dimension” is the technical term for what would commonly be called a “generalization,” and in statistical terms involves a set of indicators which co-vary. “Economic development” is the standard—and appropriate—example: we know from experience that advanced industrialized economies differ in a large number of ways from developing economies, and while GDP/capita is the most common way of measuring these, plenty of others would work just as well. Famously in the conflict forecasting realm, infant mortality rate.  “Democracy”, “quality of governance,” “globalized economy” and “political instability” are other common relevant latent dimensions in the conflict forecasting literature.

The key point about latent dimensions is once you’ve got a couple of measures for a dimension—or even a single really reliable measure—adding more variables gives you very little information. In fact, in the linear models commonly used in statistical studies—discussed in the next entry—this becomes counter-productive because of a problem called co-linearity, which plays havoc with the variance of your coefficient estimates.

Latent dimensions are also the reason Achen’s “Rule of Three”—discussed next time—is so successful. The Doughboys hate this: their interest is in piling on redundant indicators to drive up the costs and delay the project, and hairball models are a solution, not a problem. Resist.

4. “Error” is part of the process

For starters, in social science models, it’s not really “error” in the same sense that we think of “error” in tightly controlled physical processes. You are better off thinking about “error” as “things not included in the model” because we’re dealing with open complex processes, not the engine cylinder clearances on a BMW 760. They’ve not been included because the information is not reliably available, or is not cost-effective [15], or the indicators will actually introduce more error than they reduce, or the process is intrinsically random. [16]

So, it’s not “error”, it’s “everything else”. But keeping with convention, we’ll keep calling it “error.”

5. Accuracy, Sensitivity, Precision and ROC curves

“You can’t manage what you can’t measure” right?  In the bad-old-days, social scientists tended to measure errors using a single linear correlation-based measure called R-squared but contemporary models for predicting whether or not something will happen are generally evaluated on the aforementioned series of measures that the machine-learning folks have been using for quite some time. Look’em up: Wikipedia has vast resources here. The “ROC curve”—and the most common statistic based on it, the AUC [area-under-curve: Google it]—is particularly challenging to grok because it has an invisible dimension—the change in the threshold at which the model predicts the event will happen—but this is now nearly universally used, so put in the effort.

Most prediction problems relevant to national security concern “rare events” and these present a number of measurement challenges, not all of them resolved. Again, get up to speed on this and know the “gotchas” E.g. it is trivial to generate very high accuracy on any rare events problem without producing anything useful for policy purposes. [17]

6. Estimation

All non-trivial models have coefficients [18] and these must be estimated from the data. All estimates have—that word again—errors, or more accurately, variation, and this can also be estimated, though the veracity of that estimate is dependent on the extent to which the characteristics of the data—nowadays the term is usually “data generating process”—corresponds to the assumptions used to derive the estimation method, and as anyone with any experience in the field knows, the Data Fairy is frequently not very kind. All estimation methods can exhibit pathological behaviors when confronted with sufficiently weird data but fortunately for the widely used modeling methods—discussed in the next entry—these are extensively studied and understood.[19] New methods?: you’ll be the test case, and these may go very badly. [20]

Historically, social science statistical work was done “in-sample”, where the data used to estimate the model and the data used to test it were the same. This leads to “over-fitting” or “fitting the error” and often did not generalize. Contemporary work—and virtually all machine learning work—uses any of a number of more robust  “split sample” designs where the “training” and “test” data are separate.

7. Significance testing versus Bayesian approaches

Historically, virtually all social science statistics were done using the “null hypothesis significance testing” approach [21], which is both highly counter-intuitive and very often misinterpreted even by people who should know better, but was the only practical method prior to the availability of large amounts of computing power and some innovations in estimation methods that only occurred a couple decades ago. Significance testing is gradually being replaced by Bayesian methods [22], which are more likely to provide the information you are actually looking for, and in principle should integrate well with qualitative approach: the buzzword here is “informed priors.” The important thing: understand what each approach does and does not do.


Enough for now, and I’m a bit over the word limit already. There’s more to come, and I’ve already assigned too much homework. Still, just with this material alone, you can start the push-back and keep your project from going off the rails: just look those Doughboys straight in the eye and growl “So, feeling lucky, punk?” [23]

Beyond the Snark

Not a lot this time: Wikipedia is generally pretty good on the subject of statistics—well, it is amazingly comprehensive and technically accurate; sometimes the exposition leaves a bit to be desired. Quite a number of people who teach methodology, where a huge amount of labor goes into producing a good set of lecture notes, have posted these on the Web. As has MIT. []  I’ve generally used standard terminology here, though in a few areas things get a little confusing because the statistics and machine-learning communities, having developed more or less independently (this is actually quite remarkable but hey, academic silos are built to withstand a lot of pressure) not infrequently use different terms for the same concepts. But in general with all of these terms “Google it” will get you lots of information.

My original and somewhat technical exposition on what is wrong with most of the existing approaches: By the way, a few people have interpreted this article as my rejecting quantitative approaches. Far from it, starting with the fact that doing quantitative work is how I keep food on the table. I’m merely saying if you are going to use these increasingly effective methods, do it correctly.

GSA excess: And the practical consequences:

Defense contractor-sponsored equivalents to GSA: funny, you don’t see any stories about those events. An absence of curiousity which I’m sure is entirely unconnected with the presumably rather costly defense contractor advertisements that keep popping up on my—perhaps not your—Web versions of the New York Times and Washington Post. “Yes, an F-35…why, just in time for the holidays, a perfect gift for my nephews! I’ll take three, my good man, and can you have them wrapped and delivered? Jolly good of you to remind me with that expensive animated advertisement!”

How Rome became with the hegemonic successor to the Hellenistic empires: haven’t read it but Mary’s Beard new popular history, SPQR [] has gotten a lot of good reviews. There’s more to the story than gladiators. Really.  And with a level of wealth inequality approaching that of imperial Rome, there’s stuff we can learn.


1. With Reagan now subjected to vicious character assassination by the Fox News crowd, I’m going to open the next few entries with Reagan quotes. Judge people by their enemies, eh? Besides, in the current environment, he’d be considered a bit of a lefty, what with all that arms control and raising taxes.

2. I’m not providing many links this time: With every technical phrase in this entry, if I can Google it, you can Google it.

3. And whenever I’m told this, people outside of DoD—never from the inside—suggest I’m cynical because all of the highly successful projects are secret. Well, if they are I’m one important guy, because a huge pile of money has been spent on unclassified foolishness over the years just to distract me from learning about the good stuff. Really, I don’t think I’m that important.

4. In the old days, these came with a nice spread of donuts and sandwiches—typically we’re doing these things for little or no compensation beyond expenses. But with the combination of the Tea Party trash-and-burn budgetary tactics and that wonderful tax-payer-funded Las Vegas bacchanalia of the GSA, those days are gone. Launder that same tax money through a defense contractor, of course, and the bacchanalia are still absolutely fine: been to a few of those as well, and other than the experience of seeing the reckless extravagance making me want to throw up, they are splendid exercises, and most of the attendees, staggering about under the burden of unlimited quantities of cheap booze, consider them a professional entitlement. Though I wouldn’t be surprised if those things have pushed more than a few folks—at least the caterers—into the Tea Party, albeit with no effect.

Get yourself to one of these defense industry affairs and it’s night after night of lobster, champagne and live entertainment in lavishly decorated resort hotel ballrooms, all provided by—smirk, smirk, wink, wink—”company sponsorship.” Going to a government-funded conference on finding a cure for Alzheimer’s, or teaching kids to read, or preventing rusting bridges from collapsing?: it’s gonna be Subway and a diet cola, maybe followed by a pitcher of Miller Lite shared with your buddies at an anonymous sports bar in a strip mall, and remember to bring your wallet, because all this comes out of your own pocket.

Some dumb hick from Abilene, Kansas pointed out the problems with this system back in 1961. Lotta good that did.

5. Seriously, do you think the Federal Reserve Board and the Centers for Disease Control spend their days listening to a gaggle of demented bozos braying about how human behavior is unpredictable?

6. There’s a little ritual in these meetings where the program manager earnestly intones a little mantra—I’m not sure whether the original was in Latin or Sanskrit—about their deep responsibility of not wasting public money. Given they know the Doughboys are in the room precisely to insure that public money is wasted, it must be hard to do this with a straight face. Though I suppose that why program managers get paid the big bucks. Joke.

7. For the Doughboys, “social science” is an oxymoron, an observation they will share endlessly. In fact they probably have ” ‘social science’ is an oxymoron” tattooed somewhere on their bodies, probably somewhere I’d certainly prefer not to look.

8. Technical term of art: actually, one of my several working titles for this essay was “Seven Under-stated Reflections on When the Hell are You People Going to Learn How to Keep Those Parasitic Bastards from Making Off with My Tax Dollars?” But that doesn’t scan particularly well.

9. When the history of the decline of United States hegemony is written in a century or two, the role of the Doughboys will probably deserve at least a footnote. Though the role of the PowerPoint virus—no, not a virus carried by PowerPoint; PowerPoint is a virus—will get a chapter.

That history may be written in English—in New Delhi—rather than Chinese, just as the successor to the Hellenistic empires was not the obvious Mediterranean candidate, the wealthy and commercially savvy Phoenicians from their base in Carthage—but instead a theretofore marginal group of Italians living along the Tiber. The second mouse, as it were.

10. I will assume you can find these on Amazon or, if you prefer not to support a soul-destroying mega-corporation whose business model involves removing every last visage of humanity from the workplace, a local bookshop if you have one. Probably run either by some balding old guy who is likely to engage you in an extended conversation when you really just wanted to buy a book, or someone with cats. Though I’m also actually becoming rather fond of the Barnes and Noble chain, particularly when after seeing the grey hair, they let this old guy to the front of the line for a seat at an overflowing author’s event, and at Union Square in New York City no less! I digress…damn old guys who don’t know when to stop…well, “communicating”, if that’s even the relevant concept… 

11. We’ll deal with his more recent work on superforecasters in the next entry.

12. General link is Since I still have an adjunct academic appointment with library access, the version I’m seeing isn’t paywalled; your results may differ. If you dig a bit,you can usually located non-firewalled versions of widely-cited academic papers, and this would qualify. Or you can pay: none of that payment goes to the authors, of course, as academic publishing doesn’t work that way, instead it is a monk-like humble offering to the cause of restricting the flow of knowledge generated through public funding and to further increasing the level of inequality through the support of a tiny oligopoly of rapaciously profitable publishers. Yet again, I digress.

13. And GOP presidential candidates…

14. For some fairly technical reasons, “specification error” in some of the most commonly used models is even worse, since if a variable you’ve incorrectly put in the model is correlated with variables which are actually causal, the variable will appear to have a stronger effect than it actually has. Though if you are only interested in prediction, this isn’t that big a deal. Still, a model that is consistent with a correct theory will almost certainly have better properties than a model that isn’t. Which is also why the “data will replace theory” arguments are overly optimistic: a good theory is vastly more useful than any undifferentiated mess of data.

15. Hey, give us poor proles who still actually have to pay taxes a break here, will you?: even in DoD research there should be a concept of “too expensive.”

16. Or appears to be, as in chaotic processes such as weather. I’ll say a bit about chaotic processes in the next entry; for now suffice it to say these are not the semi-mystical phenomenon some would have you believe, just an unexpected but completely deterministic aspect of a very simple dynamic equation you can easily experiment with in one column of a spreadsheet.  Intrinsic randomness could be an essay in its own right…maybe later…

17. But caution, particularly if you’ve only skimmed Taleb: rare events are not the same thing as black swans.

  • Black swan: an event that has a low probability even conditional on other variables
  • Rare event: an event that occurs infrequently, but conditional on an appropriate set of variables, does not have a low probability

Using a medical analogy, certain rare forms of cancer appear to be highly correlated with specific rare genetic mutations. Conditioned on those mutations, they are not black swans.

Taleb definitely gets these distinctions, but many of the popularizations of Taleb (who in turn is quite consciously—he profusely acknowledges their work—a popularizer of Kahneman and his collaborators, and Tetlock) miss it.

Also worth noting here—since I ran out of my seven allocated categories—are events which are too predictable: these are called “auto-regressive” or “auto-correlated,” which simply means that the value of a variable at time t is highly correlated with the value at t-1. Most human activities have this characteristic—humans, and particularly human institutions, are fairly boring and predictable, except when they aren’t. The sorts of sequences one tends to look at in political conflict forecasting are highly autocorrelated except for a small number of highly consequential exceptions which are…rare events. From the perspective of a methodologist, it makes the whole problem rather interesting.

And in a final, really techy, aside, there’s a tendency to confuse autocorrelated variables and autocorrelated errors. The presence of the latter considerably complicates estimation, all the more so when you get both at once. Errors are autocorrelated for the same reason variables are autocorrelated: human behaviors tend not to change much over time, and “errors” are just the factors not included in the model, and quite a few of those involve humans.

18. Google “coefficient” if you aren’t familiar with the term: I can’t begin to count the number of meetings I’ve sat through where we appeared to be operating under an assumption that functioning models would be delivered by, well, maybe Elminster Aumar, High Wizard of the Forgotten Realms?…I dunno, sure the heck wasn’t going to be through any systematic estimation method worthy of discussion.

19. Newer machine learning methods also have a “hyperparameter” issue—the estimators can be configured in a wide variety of different ways, some better than others, and optimizing these using vast amount of machine cycles is another important new research field. Older methods were derived algebraically and generally had a very small number of free parameters.

20. It is remarkably difficult to find new methods that consistently outperform the “obnoxiously effective” old standbys I will discuss in the next entry—conventional and logistic regression, support vector machines, conventional neural networks, and clustering methods. That’s why they are old standbys. That’s also why a genuinely effective new entrant like recurrent neural networks is such a big deal.

21. Usually referred to an “frequentism”, particularly by its detractors. Its supporters call it “statistics.” In some circumstances, frequentism is completely appropriate. But it isn’t universally appropriate and until about 20 years ago, it was treated as such.

22. Look at the research interests of the faculty in almost any university statistics department and you’ll find that most of the younger people are working on Bayesian methods: this is not an instance of random selection.

23. Like so many memorable quotes, that’s not the actual line, which had a rather rambling preamble before finally getting around to:  “You’ve gotta ask yourself one question: “Do I feel lucky?” Well, do ya,punk?” Rather as the oft-quoted “Play it again Sam” condensed seven lines of dialog  none containing the word “again.” For simplicity, and cognizant of the date, 17-Dec-15, stick with “Han shot first.”

Posted in Methodology, Politics | 1 Comment