Skip to main content

Using data to guess authorship of the Federalist Papers

  I was recently reminded of how statistics settled a question of authorship of The Federalist Papers. These were 85 arguments published under a pseudonym in support of the U.S. Constitution. It is now known that they were written by Alexander Hamilton, James Madison and John Jay. For a long time, there was uncertainty about the authorship of twelve of these. Around 1960, two statisticians, Frederick Mosteller and David Wallace, published a celebrated paper that solved the riddle. The key components of the solution include: Noticing that people’s writing styles differ in terms of word preference. Certain writers habitually use certain words more often than others. “Common words” like prepositions are better differentiators than less common words. For one thing, common words are common, and therefore we have more data to establish the base rate of authors. For example, Madison almost never wrote the word “upon” while Hamilton used the word quite often; thus, a document that contains no “upon”s is much more likely to have been written by Madison. Each differentiator word is a signal. A model combines multiple signals coming from multiple differentiator words. The combination is more than the sum of parts. The Bayesian model produces a probability that a specific author wrote a specific article.   The back story is also interesting: The success of 1960 was not assured. Mosteller and others tried other methods before, for example, examining average sentence lengths, and failed. Mosteller himself expressed the self-doubt characteristic of most competent statisticians: “the odds can never be greater than the odds against an outrageous event”. In other words, stranger things can happen. No matter how high the probability is, we still only have proven a correlation. For example, one respondent argued that Hamilton could have written the first draft, which Madison edited. The data could not exclude such an event. (Nevertheless, anyone forwarding this possibility ought to produce evidence to support it.)

from Big Data, Plainly Spoken (aka Numbers Rule Your World) http://bit.ly/2wtaEWN
via IFTTT

Comments

Popular posts from this blog

Controlling legend appearance in ggplot2 with override.aes

[This article was first published on Very statisticious on Very statisticious , and kindly contributed to R-bloggers ]. (You can report issue about the content on this page here ) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. In ggplot2 , aesthetics and their scale_*() functions change both the plot appearance and the plot legend appearance simultaneously. The override.aes argument in guide_legend() allows the user to change only the legend appearance without affecting the rest of the plot. This is useful for making the legend more readable or for creating certain types of combined legends. In this post I’ll first introduce override.aes with a basic example and then go through three additional plotting scenarios to how other instances where override.aes comes in handy. Table of Contents R packages Introducing override.aes Adding a guides() layer Using the guide argument in scale_*() Changing multiple aesthetic par...

Using RStudio and LaTeX

(This article was first published on r – Experimental Behaviour , and kindly contributed to R-bloggers) This post will explain how to integrate RStudio and LaTeX, especially the inclusion of well-formatted tables and nice-looking graphs and figures produced in RStudio and imported to LaTeX. To follow along you will need RStudio, MS Excel and LaTeX. Using tikzdevice to insert R Graphs into LaTeX I am a very visual thinker. If I want to understand a concept I usually and subconsciously try to visualise it. Therefore, more my PhD I tried to transport a lot of empirical insights by means of  visualization . These range from histograms, or violin plots to show distributions, over bargraphs including error bars to compare means, to interaction- or conditional effects of regression models. For quite a while it was very tedious to include such graphs in LaTeX documents. I tried several ways, like saving them as pdf and then including them in LaTeX as pdf, or any other file ...