Monday, November 21, 2016

The Data Never Lies


But it does fib sometimes, at least that's what I've been learning by processing transcripts of our President Elect and the previous Democratic frontrunner. It turns out they both use many of the same words (imagine that.) I was hoping that by analyzing the transcripts of the presidential debates I would find some deeper meaning to election results.

Well... maybe not but I think some of the data is interesting in itself.

Let's start with the First Presidential Debate:

Here's what Secretary Clinton said in the form of a wordcloud:



Here's Mr. Trump's side in the form of a word cloud:
Now what became immediately apparent to me was: common words are showing up in a largely overrepresented way.  It's *interesting* that Clinton had a higher usage of We overall than Trump's usage of the word You.  But that could have more to do with the nature of the message each candidate is trying to impart.  

Looking around at the other words, many of them are very low signal (syntactically useful, but not very useful out of context.)

So, given that I reran this with the top used words in the english language filtered out.

The First Presidential Debate - Take Two:




There we go, almost to the meat of the debate.  We can make it even better, but for now lets process the second and third debates with the same rules.

The Second Presidential Debate




The Third Presidential Debate:








No comments:

Post a Comment