Word Finding
Algorithm
Site Map Feedback

Download:

Up Binary Chop Checksums Time QuickSearch Word Finder

Pick out Words from Text

This class is intended to pick out important sub-strings of data (tokens) from a string.
It was designed for a spell-checker and so currently returns words from sentences.
Its perception of what letters can make up a "word" comes from a table.
This table is currently loaded with characters from dictionaries downloaded from the Internet.
Apostrophe (') is considered part of a word to allow "don't" and "o'clock" etc.
Single letters are ignored.

There are two versions of the class,
one uses the Standard Template Library string class,
the other uses the MFC CString class.
Usage:

std::string Version:
CSTLWordFinder WordFinder("S=WordFinder.GetFirstWord();");
MFC CString Version:
CWordFinder WordFinder("S=WordFinder.GetFirstWord();");

  ASSERT(  WordFinder.GetFirstWord()      =="WordFinder");   // "S" is length 1, therefore, not a "word" but a letter.
  ASSERT(  WordFinder.GetNextWord()       =="GetFirstWord");
  ASSERT(  WordFinder.GetNextWord()       =="");             // (no next word)
  ASSERT(  WordFinder.GetFirstWord()      =="WordFinder");
  ASSERT(  WordFinder.GetWord()           =="WordFinder");
  ASSERT(  WordFinder.GetWordCapitalised()=="Wordfinder");
  ASSERT( *WordFinder                     =="WordFinder");   // You can use the class as if it were the data
  ASSERT(  WordFinder++                   =="WordFinder");   // "S" is length 1, therefore, not a "word" but a letter.
  ASSERT(  WordFinder++                   =="GetFirstWord");
  ASSERT(  WordFinder++                   =="");             // (no next word)
  ASSERT(  WordFinder.GetFirstWord()      =="WordFinder");
  ASSERT( *WordFinder                     =="WordFinder");
  ASSERT(++WordFinder                     =="GetFirstWord");
  ASSERT( *WordFinder                     =="GetFirstWord");
  ASSERT(++WordFinder                     =="");             // (no next word)
If all you want to do is Capitalise the words in a string (i.e. make the first letter of each word a Capital Letter (upper case) and the rest lower case) the algorithm is as simple as this:
void Capitalise(char* ptr) {
  --ptr;
  bool FirstLetter=true;
  while(char c=*++ptr) {
    if(isalpha(c) || c=='\'') { // Special case for words like "I'll" which would otherwise become "I'Ll"
      if(FirstLetter) {
        FirstLetter=false;
        *ptr=toupper(c);
      }else *ptr=tolower(c);
    }else FirstLetter=true;
} }
Here's a version for MFC CString users:
void Capitalise(CString& S) {
  char* ptr=S.GetBuffer(0)-1;
  bool FirstLetter=true;
  while(char c=*++ptr) {
    if(isalpha(c) || c=='\'') { // Special case for words like "I'll" which would otherwise become "I'Ll"
      if(FirstLetter) {
        FirstLetter=false;
        *ptr=toupper(c);
      }else *ptr=tolower(c);
    }else FirstLetter=true;
  }
  S.ReleaseBuffer();
}

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


 (2513) Last modified: Tue, 03 Feb 2009 18:26:40 +0000