Forum Moderators: open
ALL bots were both reading robots.txt and then following immediately with the same folder and page.
There must be some correlation or connection either between the bots or perhaps on the page itself.
I went over and over the page html and could not find any weakness.
68.178.242.#*$! - - [22/May/2006:03:23:29 -0700] "GET /robots.txt HTTP/1.1" 200 3727 "-" "-"
68.178.242.xxx - - [22/May/2006:03:23:36 -0700] "GET /myfolder/mypage.html HTTP/1.1" 403 - "-" "-"
66.154.103.150 - - [07/May/2006:00:20:10 -0700] "GET /robots.txt HTTP/1.0" 403 - "-" "Gigabot/2.0/gigablast.com/spider.html"
66.154.102.96 - - [07/May/2006:00:20:11 -0700] "GET /Same Folder/Same Page.html HTTP/1.0" 403 - "-" "Gigabot/2.0/gigablast.com/spider.html"
64.62.228.xx - - [18/May/2006:07:41:53 -0700] "GET /robots.txt HTTP/1.1" 200 3727 "-" "-"
64.62.228.xx - - [18/May/2006:07:41:53 -0700] "GET /Same Folder/Same Page.html HTTP/1.1" 403 - "-" "-"
216.195.47.xxx - - [23/May/2006:07:13:41 -0700] "GET /robots.txt HTTP/1.1" 200 3727 "-" "-"
216.195.47.xxx - - [23/May/2006:07:13:46 -0700] "GET /Same Folder/Same Page.html HTTP/1.1" 403 - "-" "-"
64.71.167.xx - - [24/May/2006:09:19:40 -0700] "GET /robots.txt HTTP/1.1" 200 3727 "-" "-"
64.71.167.xx - - [24/May/2006:09:19:43 -0700] "GET /Same Folder/Same Page.html HTTP/1.1" 403 - "-" "-"
Most of these bots visited the same pages over and over in the excat same order.
Any thoughts?
Other than it's time for my medication ;)
Don
only log entries.
Intersting in that the backbone of this IP range is the same name as the HE pest.
The actual IP regsitered to a Moscow address.
I've added the backbone range to my denies.
Don
What you may be seeing is what I predicted months ago in that scrapers/crawlers are building distributed networks so they can't be caught by downloading too many pages from a single source.
AdSense and other monetization programs is a very good incentive for this type of activity.