Forum Moderators: open
38.98.19.## - - [21/Jan/2007:03:54:12 -0500] "GET / HTTP/1.1" 200 3816 "http://www.apassion4jazz.net/" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7 SnapPreviewBot"
Took every file from index.html including images, scripts, css. No request for robots.txt. Changed IP address D class each request.
[webmasterworld.com...]
[webmasterworld.com...]
I have the whole of the 38. class denied in .htaccess - nothing good ever comes from there.
[edited by: Mokita at 11:06 pm (utc) on Jan. 22, 2007]
I could see in my logs hits coming from 38.* IP addresses with the user agent "Mozilla/5.0 (compatible; SnapPreviewBot; en-US; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9". The page and all graphics and stylesheets were downloaded, including favicon.ico, but robots.txt wasn't requested.
So it doesn't seem to be a standard spider as far as it doesn't actively crawl a page until someone tries to view the preview image.