Welcome to WebmasterWorld Guest from 54.196.175.173

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Strange?

Just a heads up

     

wilderness

1:00 am on May 25, 2006 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month




Recently I had two ranges that I knew were previously denied be allowed access, due to a syntax error I had made.
(Funny thing about syntax errors, there are some that result in 500's and take your entire site (s) down, while others may linger incorrectly for months or longer until we see something eye-opening).
The syntax error and the correction allowed me to be more alert of verfying that the correction had actually solved the problem. In the process I stumbled across this strange correlation.

The following may just be coincedence, however I don't believe so.

ALL bots were both reading robots.txt and then following immediately with the same folder and page.
There must be some correlation or connection either between the bots or perhaps on the page itself.

I went over and over the page html and could not find any weakness.

68.178.242.#*$! - - [22/May/2006:03:23:29 -0700] "GET /robots.txt HTTP/1.1" 200 3727 "-" "-"
68.178.242.xxx - - [22/May/2006:03:23:36 -0700] "GET /myfolder/mypage.html HTTP/1.1" 403 - "-" "-"

66.154.103.150 - - [07/May/2006:00:20:10 -0700] "GET /robots.txt HTTP/1.0" 403 - "-" "Gigabot/2.0/gigablast.com/spider.html"
66.154.102.96 - - [07/May/2006:00:20:11 -0700] "GET /Same Folder/Same Page.html HTTP/1.0" 403 - "-" "Gigabot/2.0/gigablast.com/spider.html"

64.62.228.xx - - [18/May/2006:07:41:53 -0700] "GET /robots.txt HTTP/1.1" 200 3727 "-" "-"
64.62.228.xx - - [18/May/2006:07:41:53 -0700] "GET /Same Folder/Same Page.html HTTP/1.1" 403 - "-" "-"

216.195.47.xxx - - [23/May/2006:07:13:41 -0700] "GET /robots.txt HTTP/1.1" 200 3727 "-" "-"
216.195.47.xxx - - [23/May/2006:07:13:46 -0700] "GET /Same Folder/Same Page.html HTTP/1.1" 403 - "-" "-"

64.71.167.xx - - [24/May/2006:09:19:40 -0700] "GET /robots.txt HTTP/1.1" 200 3727 "-" "-"
64.71.167.xx - - [24/May/2006:09:19:43 -0700] "GET /Same Folder/Same Page.html HTTP/1.1" 403 - "-" "-"

Most of these bots visited the same pages over and over in the excat same order.

Any thoughts?
Other than it's time for my medication ;)

Don

bull

10:35 am on May 25, 2006 (gmt 0)

10+ Year Member



Sorry Don,
I cannot confirm this from my log archives, but several IPs from 64.71.167.* repeatedly attempted to crawl using
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)
. (This is a Hurricane E. range and therefore denied anyway)

Cheers
Jan

wilderness

4:29 pm on Jun 1, 2006 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



HTTP/1.1" 200 3727 "-" "-"
208.66.195.6 - - [01/Jun/2006:06:31:58 -0700] "GET /Same Folder/Same Page.html HTTP/1.1" 403 - "-" "-"

only log entries.

Intersting in that the backbone of this IP range is the same name as the HE pest.

The actual IP regsitered to a Moscow address.

I've added the backbone range to my denies.

Don

incrediBILL

6:27 am on Jun 8, 2006 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



The range 68.178.242. is GoDaddy hosting and I've blocked them due to various creepy crawlers from their servers.

What you may be seeing is what I predicted months ago in that scrapers/crawlers are building distributed networks so they can't be caught by downloading too many pages from a single source.

AdSense and other monetization programs is a very good incentive for this type of activity.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month