In order to get a quotation for a translation of one of my sites I was asked to get a word count for the entire site (150+ pages). Short of going through page by page, copying all of this text into a Word document and running a word count tool, are there any online tools that will run through my site and give me a word count of visible text? It doesn't have to be exact, just a general ballpark figure.
Whoa, that would be pretty specialized bill. The only way I could think of doing it would be this way:
a) get a text editor capable of stripping html out. (editplus does) b) start merging in all your files into one big file. There may be some utility you could find to do this. I've seen file combiners before, but none come to mind. c) run the editor's "strip html" command. d) replace all white space (tabs, space) down to return characters. Often you can do that with a regular expression search just looking for "\n".
After that, it would take some hand cleaning to get rid of any left over junk. Maybe sort the file in the editor to get the junk to the top and bottom and get rid of any blank lines.
When done, the number of lines should equal the number of words on your site.
eek That doesn't sound fun. I was trying something similar with NoteTab but it was getting messy.
I posted this same question to a mailing list I subscribe to and someone mentioned that the [url=atomz.com]Atomz[/url] search engine will return this information when it spiders your site. Luckily this particular site already uses the Atomz search, so I just set the parameters I wanted in the members area and had them reindex my site. The Index report tells you how many pages, words, and bytes of info it found. It also reports the number of word-endings, synonyms, and sound-alike words were included in the index. I don't know how good the results are, but it looks good enough for my purposes. I just needed a general idea.