The update was a basic code cleaning and to address a couple of known bugs: - adjusted the html stripping routines to work with more styles of meta tags. - updated the stop word list. - referral string identifying url as [SearchEngineWorld.com...] in order to stem abuse. (may have to remove optional agent name and go to full id) - fixed 2 known show stoppers with certain urls. - did not change the algo.
There was a problem with people that didn't use Quotes in their meta tags and the html stripper wouldn't properly recognize the rest of the head section. I still don't support meta tags without quotes around them, but atleast the rest of the page is ok now.
There were some other problems with embedded scripts not being recognized when there were multiple - different styles of scripts (eg: vbs).
Additionally, the analyzer would hang on urls when the domain name was longer than 60 characters.