Improving The English Dictionary

When compiling dictionaries for spelling checkers you need to find a balance between including all the words likely to be used and including too many which increases the chances of a typo resulting in a valid but incorrect word. Additionally, the more words in the dictionary the larger the download and the longer it takes to actually check the spelling. The default dictionaries that come with EditLive! are generally fairly small and contain the most common English words. You can extend that with any words from your specific vocabulary by creating a custom dictionary jar file (also see what the spelling checker defines as a word).

Some people, particularly those in the publishing and technology industries, have particularly expansive vocabularies and are frustrated by the limited number of words in the default dictionary. Compiling a comprehensive word list is a major undertaking so often these people just struggle through with the default. To help with that, we've gone out and found some freely available word lists to add in to your dictionary jars that significantly expand the set of recognized words.

To help pick the words that are most appropriate to your users, we've split these word lists up into a few separate files:

  • en_us.clx - a combination of words from the enable2k word list and 12dicts (specifically the 6of12 list) compiled by Alan Beale.
  • jargon.clx - a combination of common words from the jargon file (Ver 4.2.0-1) and words we've specifically added to update it and add some of the words that fall in between the technical and business worlds, plus a few words and company names to bring it up to date.
  • names.clx - The 500 most common male names and 500 most common female names from the US census data filtering out those that are likely to be misspellings of English words.
  • ephox.clx - a few Ephox product names.

We suggest you add any of these dictionaries that are relevant to the existing dictionary files in the en_us jar file. Simply unzip the jar file with your favorite zip utility then create a new zip file with the extra files. Make sure you include just the dictionary files and not any directories.

If you find these dictionaries useful or if you have suggested improvements or problems, please take a moment to let us know on the LiveWorks! mailing list (you can also use the web interface to the list provided by Nabble).

How does EditLive! decide which words to check for spelling errors?

We've seen a few customers recently creating custom dictionaries with words that simply won't be checked by our spelling checker; we use a third party spelling component so we don't have much control over this process.  Here are the rules that the component uses to determine which words will be checked for spelling errors, keep these in mind when constructing a custom dictionary.

  • A word is an alphanumeric character followed by any sequence of alphanumerics or apostrophes.
  • Hyphens are word delimiters, hyphenated words will be checked separately.
  • Periods surrounded by alphanumerics are considered part of the word, and trailing periods are considered part of the word if the word contains embedded periods interspersed among no more than two consecutive alphanumerics (e.g., the period at the end of U.S.A. is considered part of the word, but the periods at the end of USA. and ephox.com. are not).
  • Apostrophes at the end of a word are considered part of the word if they are preceded by the letter "s".
  • An "at sign" (@) is considered part of the word if it is surrounded by alphanumerics and the following word contains embedded periods (i.e., appears to be an e-mail address).
  • The string "://" is considered part of the word if it is surrounded by alphanumerics (i.e., appears to be part of a URL).
  • A slash (/) is considered part of the word if it is surrounded by alphanumerics and the preceding part contains embedded periods (i.e., appears to be part of a URL).
  • Characters &, %, +, =, ?, and _ are considered part of the word if the word contains embedded periods (i.e., appears to be part of a URL).

There are some special cases that we support.  Both can be changed by creating a file called "Spelling.properties" in your dictionary jar file and adding the text as specified below.  If you specify both options, each must be on a separate line.  For an example of this, look at the contents of our French and Italian dictionary jar files.

  • Hyphenated words can be checked as a single word with the option SPLIT_HYPHENATED_WORDS_OPT=false.  With this turned off, hyphens surrounded by alphanumerics are considered part of the word.
  • Apostrophes can be turned into word delimiters with the option SPLIT_CONTRACTED_WORDS_OPT=true.  This is turned on by default in our French and Italian dictionaries.

Note that both of these options are global, if changed they will apply to the entire editor - not just your custom dictionary.

Adding Custom Words To The Spelling Dictionary

One of the lesser know, but very useful functions of EditLive! is the ability to customize the spelling dictionaries. This allows you to add domain specific words or specific company titles that your authors regularly use. We've had an article in our documentation for quite some time on how to do this, but with all the options to configure EditLive! it often gets missed.

So, if you're sick of that red squiggly line, take a look at Creating, Modifying and Adding to Dictionaries, one of the hidden gems in our documentation. It's actually quite simple.