Archive for the ‘Uncategorized’ Category

Hunting for Latin roots

Monday, July 28th, 2008

In a previous post, I showed some word roots that although sound the same, are completely unrelated.

I found another example today with the word venery.  Venery has two definitions, both of different origins.

1. venery - hunting for wild animal
2. venery - the pursuit of sexual pleasure or indulgence

I can imagine an award conversation based around the misinterpretation of this word.

Bob:  Hey John, how was your weekend?
John:  It was great, the wife was out of town, so I had an entire weekend of venery.
Bob:  I didn’t realize anything was in season.  What’d you get?
John:  A blond and a brunette

Sorry if that was lame.

The first definition comes from the Latin word venor, meaning to hunt (think venison).  The second definition comes from the Latin venus or vener-, meaning desire or love (think venereal).

Don’t confuse this with the similar-sounding Latin root venera, from which we get venerable.

Halcyon days

Thursday, July 24th, 2008

There’s a term in language processing call an n-gram, which is a sequence of n words.  A bi-gram , for example, is a sequence of two words.

I’ve been wondering if there are common bi-grams of words in GRE and SAT vocabularies that could make it easier to determine answers to fill-in-the-blank questions.

One example might be the word halcyon.

halcyon - Calm, undisturbed, peaceful, serene.

The phrase halcyon days (also a bi-gram) is used to refer to a period of calmness during the winter.  I thought it’d be interesting to search through texts to see if most uses of halcyon follow that pattern.

Some examples I’ve found

When young, sweet love, with her luring smile,
The mystic charm-light of halcyon hours,

Bird of the sea rocks, of the bursting spray,
O halcyon bird,
That wheelest crying, crying, on thy way;

Weavers, weaving at break of day,
Why do you weave a garment so gay? . . .
Blue as the wing of a halcyon wild,
We weave the robes of a new-born child.

The halcyon days at length draw to a close, and sorrows “in
battalions” compel them to emigrate and bid

“Assign’d am I to be the English scourge–
This night the siege assuredly I’ll raise:
Expect St. Martin’s Summer, halcyon days,
Since I have entered into these wars.”

So total, I found two days, one hours, one wild, and one bird.  My theory unfortunately doesn’t pan out for halcyon, but I’ll keep looking.

Contemporary Comparisons

Friday, July 11th, 2008

Janet Evanovich is to the New York Times Best Seller list what Cliff Clavin is to Cheers.   If the NYT Best Seller list was a bar, the number-one stool would be perfectly molded to the shape of Evanovich’s buttox.
Wow, that reference made me feel old.

I don’t want to exclude contemporary writers from analysis, even though it is slightly more difficult to obtain the texts for their works.

I took a look at the first ten books of Janet Evanovich’s Plum series (the one where all the titles start with a number).  I made a table comparing the total number of words and the total number of unique words.

Book Words Unique
1 72000 7200
2 81000 7200
3 86000 7300
4 81000 6700
5 80000 6300
6 78000 6300
7 78000 6400
8 79000 6300
9 80000 6500
10 79000 6200

Overall she seems fairly consistent. The books are typically around 80,000 words, and have a vocabulary of 6500 words.  However, there is a noticeable drop off in vocabulary between the third and fourth book.  Maybe she lost her thesaurus.

Since I still have some Mark Twain handy, I’ll first compare her books to one of his works.  The Adventures of Tom Sawyer is similiar in length and weighs in at 74,000 words, 7600 of which are unique.  That vocabulary isn’t much more than one of Evanovich’s books.

Next I’ll try a British author, like Charles Dickens.  It was difficult to find a book of his of comparable size.  I finally settled upon one of his non-fiction books, Pictures from Italy.  Even at only 75,000 words its vocabulary weighs in at 9100!  Just for fun I’ll look at another British book, Emily Brontë’s Wuthering Heights which packs in a vocabulary of 9500 words.

I don’t like to jump to conclusions with such a small sample size, but it does seem that the British do know the English best.

A look at the vocabulary of Edgar Allan Poe, part I

Thursday, July 10th, 2008

I performed a similar exercise on the complete poetic works of Edgar Allan Poe.

Given that it’s much smaller than Twain’s complete works, the vocabulary list is noticeably shorter–8,000 compared to Twain’s 45,000–and much easier to work with.

Given Poe’s reputation as the Master of the Macabre, I thought I’d take a look at some of the words he used to see if that title is justified.

night day
90 45

Off to a good start . . .

death life
56 74

That’s disappointing.

dark light
21 81

Same here.

OK, so maybe it’s not fair to compare words like dark and light.  Light is an overloaded word.  Websters lists 8 definitions for dark and 15 definitions for light.

It might be more interesting to look at the rate at which Poe used certain words in comparison with other authors.

The word death made up .035% of Poe’s words, compared to .086% for Twain.  Wait, that means that Twain used death at more than twice the rate of Poe.

Let’s try another one.  Poe used blood .012% of the time, compared to .015% for twain.  I’ll just have to reserve judgment until I actually read more of Poe.  It’s not the number of words you use, or even the number of times you use them that matters, it’s how you use them.

A look at the vocabulary of Mark Twain, part I

Thursday, July 10th, 2008

While I’m on the subject of Mark Twain, I thought I’d do some simple analysis on his vocabulary.  I grabbed his complete works from the gutenberg project, and started doing a rough analysis of his word use.

I’ll estimate that there are about 45,000 words, although that includes proper nouns, many eye dialects, and words like Personaleinkommensteuerschatzungskommissionsmitgliedsreisekostenrechnungserganzungsrevisionsfund.

Below are the top 10 words by count.

word count
the 155353
and 122641
of 79506
a 73607
to 71719
it 49754
in 48167
I 45584
that 39901
was 39603

There are over 15,000 words that Twain used only once.  Sorry, but I won’t be listing those here.

There’s nothing too interesting yet, but tomorrow I hope to dig deeper into the data and possibly compare it to other authors.