Source Code can be found here
Spell Checking is an extremely helpful utility. I have actually come to rely on it greatly, which in no way helps the problem (of me learning to spell the word correctly). Anyway, there is no turning back, and I like to know that everything is spelled correctly. It reflects poorly on a person when they have misspelled a word, anyone at that, easy or hard.
I think every text input area should have some form of help. Here are some common examples:
- Intellisense – No one can, or should try to memorize the Framework Class Library. But, I would be in trouble if I didn’t have something to help me find it quick.
- Outlook email – When you start to type the person’s name, it will bring up a list of contacts that match the current text. Then, you select the person you want, without having to type the whole thing, or trying to remember the email, since the name is more intuitive.
- Spell Checking / Grammar Checking
The list goes on, but the point is that it is easier on everyone if the text area is monitored and suggestions are made depending on the current text and the desired content.
I am not going to cover the creation of a custom control, but if you want to know, just look at the code, which is in C#. I will be covering the Natural Language classes provided by Longhorn, though.
There are 2 objects that one needs to process text (both of which are under the System.NaturalLanguageServices namespace and dll):
1. Context - Essentially the reference to the main engine, and analogous to the ItemContext in WinFS. Here is what a Context initialization looks like:
Context context = new Context();
context.SpellingSuggestions = true;
context.ComputeLemmas = false;
context.CharacterNormalization = false;
context.NamedEntities = false;
context.WordNormalization = false;
context.Compounds = false;
context.SpellingAlwaysSuggest = true;
context.SpellingIgnoreAllUpperCase = true;
context.SpellingIgnoreWordsWithNumbers = true;
context.SpellingStrict = true;
context.SpellingSuggestions = true;
There are a number of properties which modify the engine, and allow for different usage scenarios.
2. TextChunk – The text collection and localization add-on to Context. You need an instance of TextChunk for each language you intend to process.
TextChunk textChunk = new TextChunk(context);
textChunk.Locale = textChunk.Locale = new System.Globalization.CultureInfo(1033);
In case you didn’t know, 1033 is the integer constant that identifies English in the Unites States.
The brute of the code is very small. Obviously there are more options, but in essence, it is not that hard of a concept to program against. Here is what the processing looks like:
textChunk.InputText = theTextToProcess;
foreach (Sentence sentence in textChunk)
foreach (Token token in sentence)
IList list = token.Suggestions;
if (list != null && list.Count > 0)
foreach (object o in list)
· First, we pass in the text we wanted processed (theTextToProcess). So, we set the InputText of the TextChunk to theTextToProcess.
· Iterate through every Sentence that the engine recognizes
· Iterate through each Token in the Sentence (look at every word)
· Get a list of the Suggestions for the word. You will only get suggestions for misspelled words, obviously. The dictionary is very up-to-date, and contains many acronyms, so be careful when thinking the engine is not working correctly (Because it is probably just a weird word. For example, I was testing the word Fram at first, trying to act like I had misspelled From. I fired up Microsoft Word to see if it the engine was correct, and to my surprise, Word said it was not a word. So, then, I fired up Dictionary.com, which revealed that Fram is an acronym for ferroelectric random access memory.).
· Each object in the list is a suggestion, so you can invoke the ToString() method to get the string representation.