Forum Moderators: phranque
a) get a text editor capable of stripping html out. (editplus does)
b) start merging in all your files into one big file. There may be some utility you could find to do this. I've seen file combiners before, but none come to mind.
c) run the editor's "strip html" command.
d) replace all white space (tabs, space) down to return characters. Often you can do that with a regular expression search just looking for "\n".
After that, it would take some hand cleaning to get rid of any left over junk. Maybe sort the file in the editor to get the junk to the top and bottom and get rid of any blank lines.
When done, the number of lines should equal the number of words on your site.
I posted this same question to a mailing list I subscribe to and someone mentioned that the [url=atomz.com]Atomz[/url] search engine will return this information when it spiders your site. Luckily this particular site already uses the Atomz search, so I just set the parameters I wanted in the members area and had them reindex my site. The Index report tells you how many pages, words, and bytes of info it found. It also reports the number of word-endings, synonyms, and sound-alike words were included in the index. I don't know how good the results are, but it looks good enough for my purposes. I just needed a general idea.