Word Frequency Counter

Count the frequency of words in unicode and ascii text files

Define a word

Decide whether characters like &,-,_ should be part of a word.

Define a word

When counting the frequency of words the starting point is to decide exactly what constitutes a word.

Words in the English language primarily consist of the characters a-z in both upper and lower case. However, there are other characters which are commonly found within text. An obvious example is the @ character.

This character (and any other) can be analysed in three ways, and each will return a different result for the purposes of frequency analysis.

Firstly, you can define it as a word separator (such as the space) character. This results in myname@myurl becoming two words myname and myurl.

Secondly, you can define it as part of the alphabet used to make words. This results in myname@myurl being considered as one word.

Thirdly, you can ignore is when scanning the text. This results in mynamemyurl.

There are other characters which need to be considered when scanning English text and are not necessarily going to be word separators such as -.

WordFrequencyCounter allows you to include, ignore or define a character as a separator and so define a word precisely.

Word definitions in languages other than English

Our word counting software will count words consisting of Ascii and Unicode text characters. Ascii contains characters such as a-z,A-Z,-,$,@ and so on, as well as various accented characters. Unicode is a way of defining characters using alphabets other than the Roman alphabet - e.g. Cyrillic or Greek amoung other.