Word Frequency Counter

Count word frequency of text files

Help

Case sensitivity

The default setting is to be case sensitive so that the and The are considered different for the word frequency count. To change this, choose Scan Options, and tick the check-case tick box.

This option is only relevant when scanning English language text. It is not used for other alphabets such as Greek or Cyrillic.

Choosing an input file type

There are three options. Ascii (text) files only, Unicode (text) files only and Unicode and Ascii text files.

If you choose either of the first two options, WordFrequencyCounter will assume that the input is of this type resulting in incorrect output if the input files are of a different type.

If you choose Unicode and Ascii text files, WordFrequencyCounter will attempt to open each file in the correct format. This is based on frequency analysis of the content, and the longer the files, the greater the chance of success.

Copying text into the clipboard

The clipboard is an area of memory that is used to share data between programs. Every time that you cut and paste a part of a file, you are using the clipboard and saving information to it.

Most text editors allow you to copy text from the Edit menu using: Select All, followed by Copy. This text will then be in the clipboard.

It is also possible to copy text by highlighting it and pressing Ctrl C. You can copy an entire file to the clipboard by using Ctrl A (to highlight the file) and Ctrl C to copy it.

Ignore characters

Scan options, Ignore characters allows you to set characters which you want to ignore when scanning words. For example, if you enter 1-9 in the ignore characters edit box, words such as page1, page99, number1, one1one, would be converted into page, page, number, oneone.

There are also tick boxes for some of the more common characters that you might want to ignore such as @,-() etc. This is part of defining a word.

Ignoring files when scanning multiple files

When you are scanning files in a directory there may be some files that you want to leave out of the scan. You can do this is two ways:

Firstly, enter extensions of files that you do not want to scan - separated by spaces in the edit box (Ignore files with these extensions). E.g js html doc.

Secondly, enter a list of file names (including extensions) that you do not want to scan - separated by spaces.

Ignore Latin numerals

Select the tick box Ignore Latin numerals on Scan options and the Latin numerals listed will be ignored.

Ignoring lines

The words in lines of less than a certain length can be ignored. Select the tick box on Scan Options and enter the minimum length a line of text must be before the words will be counted.

Ignoring numbers

Select scan options and the tick box titled any group comprised solely of numbers. Numbers on their own will not be scanned e.g. 1 999 ..., but numbers contained within words will still be included. E.g. page1, page99. (These numbers can be omitted by using the Ignore options and adding the numerals 1-9 into the edit box.

Ignoring individual words

Using Exclude words, you can create a list of words which will not be counted. These can either be added individually or loaded from a pre-existing list.

If you are loading a list of words, each word should be on a separate line (with no spaces of other separators). The lists should be in text format saved as Unicode or Ascii.

Ignoring words containing less than X characters

You can ignore words of less than a set length, by selecting Exclude words, and marking the tick box Ignore Words. Enter the minumum number of characters in a word before it will be counted.

Ignoring words containing more than X characters

You can ignore words of more than a set length, by selecting Exclude words, and marking the appropriate tick box. The default is set to 99.

Ignoring words comprising repeating characters only

Words such as aa, aaa, aaaa and so on can be ignored by marking the tickbox on Exclude words called Ignore words made of repeating characters only.

Opening the output file

The scanned output containing the list of word frequencies is stored as a Unicode text file. This will not be opened properly by Notepad (Windows XP).

Use Wordpad to open the output file.

Saving the output to a file

Left click on the edit box titled Save output to file and navigate to or enter the name of the output file that you wish to use. If the file already exists, it will be overwritten.

Saving the output as an html file or as an html table

You can save the output as an html file by selecting save output as html page from the list box.

Alternatively, you can save output as an html table into a text file. This wraps the word and frequency data with the appropriate html codes to allow you to copy and paste the table into a html file.

Scanning a single file

Tick the box marked Scan a single file and Left click on the edit box to the right. Navigate to the file to be scanned.

Using the clipboard to scan text

Copy text into the clipboard. Select scan from clipboard and then WordFrequencyCounter will count the frequency of words that are in the clipboard in exactly the same way as from a file.

Viewing the contents of the clipboard

Select view clipboard and click on view clipboard data. This will show you the contents of the clipboard and the text which will be counted when the scan clipboard option is chosen.

If you wish to save the contents of the clipboard to the beginning of the output file (containing the list of counted words and their frequencies), select the tick box marked save clipboard data.