How to find english words from a list of rubbish (VBA help)

Associate
Joined
26 Feb 2004
Posts
971
Location
China (Qinhuangdao)
Not sure whether I should have posted this in 'Windows & other software", but I thought the solution would be some kind of code, so...

I have a list in Excel of series of 6 letter words. Almost all are not English words, i.e. they do not appear in an English dictionary, so if spellchecking, the dialogue box would appear.

There are 231,868 words in my list. I want to remove all the entries which are not actually words.

Here is an example :

crises
crisia
crisib
crisie
crisii
crisil
crisio
crisis
crisla
crislb
crisle

In this small selection, crises and crisis are words, the rest are not. So I would just delete all the entries which are not.

I though this could be done in VBA, with something simple like using cells.checkspelling to return if the word doesn't appear in the dictionary. However, this just brings up the spelling dialogue box.

Do you think there would be a way of doing this?
 
How elegant do you want your solution to be? Do you just want the result, or is how you got there important?

programmatically you are going to need to.

Locate a dictionary file
Load the contents of the dictionary file into some structure
Perform some searching algorithm on each word in your list on the dictionary for matches.

For the dictionary structure i recommend:



Uploaded with ImageShack.us

Which is to say we start off with an n=26 array of the letters a through z. Each letter, and all subsequent letters may contain an array of up to 26 letters a through z. You also need a termination flag to show a letter is a valid end point.

Then to validate a particular word exists we iterate through the tree. If the path for that word exists and ends on a valid termination point the word is a valid word :)
 
Last edited:
hmm interesting problem

i would personally just import them into a DB then import a dictionary into a DB then try to doa join on them (but that may be horrible)

alternitively write a console app to load the words into memory then run a foreach loop over them and try to use a dictionary API to see if it a correct word .i would hazard a guess dictionary.com has one although there is probably a limit on API calls
 
Well this clears them rather than deleting them as deleting changes the range so some don't get checked and I can't be bothered to fix that :)

Code:
Sub Macro1()
'
' Macro1 Macro
' Macro recorded 08/10/2010 by Simon
'

'
    
 For Each cel In ActiveSheet.UsedRange
        If Not Application.CheckSpelling(Word:=cel.Text) Then _
        cel.Clear
    Next cel

End Sub
 
That's excellent! Worked perfectly! I slightly modified your code to :

Code:
Sub SimonCHere()

Dim WrdCount As Integer

    For t = 2 To 231869
    
        Cells(t, 2).Select
        If Not Application.CheckSpelling(word:=Cells(t, 2)) Then
            Cells(t, 2).Clear
        Else
            Debug.Print Cells(t, 2) '   to monitor, out of interest
            WrdCount = WrdCount + 1
            Cells(WrdCount, 4) = Cells(t, 2)
        End If
    Next t

End Sub

Just for your information, of my selection of 231,868, only 88 are English words!
 
Last edited:
Back
Top Bottom