Word Frequency Counter

Counting the number of occurences of words in text files

Word frequency and language learning

One of the biggest hurdles that we face when learning a new language is acquiring a large enough vocabulary to be able to express ourselves.

Estimates vary widely as to the number of words used by an educated adult but they range from 20,000 to 80,000. How many?, an article by Michael Quinion which considers this in more detail - and also makes the point that no-one really knows how many words an educated adult uses.

Whether the answer is 20,000 or 80,000 or somewhere in between, the language learner has to face the daunting task of learning a large active and larger passive vocabulary in order to be able to communicate effectively.

This is where word counting software can help and the reason for which WordFrequencyCounter was originally designed which was to count the frequency of Latin words found in the Vulgate (an early translation of the bible into Latin by St Jerome, as an aid to learning to read the text.

The book of Genesis starts in principio creavit Deus caelum et terram and ends mortuus est expletis centum decem vitae suae annis et conditus aromatibus repositus est in loculo in Aegypto.

In between are some 28,000 words and of these about 6000 are distinct (without for the time being worrying about inflections or plurals) and of these 6000 distinct words approximately half occur once only.

The most efficient way to learn to read the text is to start by learning the words which occur most frequently. (For example, the top three words from the book of Genesis (et, in, est) account for almost ten percent of the total text).

Word frequency analysis can be incredibly useful in foreign language learning because the most commonly encountered words (outside of the usual) depend very much on the subject area. A word frequency search over a large enough body of source text will reveal these. If you are learning history or sport related language, the vocabulary you encounter will be very different from nuclear physics, and so on.

The same is true for other language.