Ryan Compton finally finished reading Infinite Jest and decided to perform some neat analyses using NLTK (Natural Language Toolkit).
[...]“But and so and but so” is the longest uninterrupted chain of conjunctions
In Infinite Jest conjunctions often appear in chains of length three or greater. There is a length-six chain on page 379 of the .pdf. It’s due to a minor character, “old Mikey”, standing at the Boston AA podium and speaking to a crowd:
[...]
Wallace used a vocabulary of 20,584 words to write Infinite JestBy comparison, the Brown Corpus, which is roughly three times longer than Infinite Jest, contains only 26,126 unique words. To be precise, the Brown Corpus contains 9,964,284 characters and 2,074,513 (not necessarily unique) words, while Infinite Jest contains 3,204,159 characters and 577,608 words. If we restrict the Brown Corpus to its first 3,204,159 characters we find a vocabulary of only 15,771 unique words.
[...]
...continue reading Infinite Jest by the Numbers to discover the code used and methodologies for these results. Interesting stuff.
There's also a link within Ryan's work pointing to Exploring Traditional Literature Electronically which looks at word cloud and word trend analysis using Infinite Jest. Worth a look.
< Prev | Next > |
---|