A look at the vocabulary of Mark Twain, part I
Thursday, July 10th, 2008While I’m on the subject of Mark Twain, I thought I’d do some simple analysis on his vocabulary. I grabbed his complete works from the gutenberg project, and started doing a rough analysis of his word use.
I’ll estimate that there are about 45,000 words, although that includes proper nouns, many eye dialects, and words like Personaleinkommensteuerschatzungskommissionsmitgliedsreisekostenrechnungserganzungsrevisionsfund.
Below are the top 10 words by count.
| word | count |
|---|---|
| the | 155353 |
| and | 122641 |
| of | 79506 |
| a | 73607 |
| to | 71719 |
| it | 49754 |
| in | 48167 |
| I | 45584 |
| that | 39901 |
| was | 39603 |
There are over 15,000 words that Twain used only once. Sorry, but I won’t be listing those here.
There’s nothing too interesting yet, but tomorrow I hope to dig deeper into the data and possibly compare it to other authors.