Forum Moderators: open

Message Too Old, No Replies

Wierd Bot behaviour

requesting same pages multiple times

         

Powdork

5:35 am on Jul 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



64.68.88.40 - - [15/Jul/2003:03:15:50 -0400] "GET /page1.htm HTTP/1.0" 200 10970
64.68.88.29 - - [15/Jul/2003:03:17:44 -0400] "GET /page2.htm HTTP/1.0" 200 10512
64.68.88.10 - - [15/Jul/2003:03:23:30 -0400] "GET /page3.htm HTTP/1.0" 200 11679
64.68.88.27 - - [15/Jul/2003:06:05:35 -0400] "GET /page4.htm HTTP/1.0" 200 7808
64.68.82.67 - - [15/Jul/2003:07:33:50 -0400] "GET /robots.txt HTTP/1.0" 404 204 "-" "Googlebot/2.1
64.68.82.67 - - [15/Jul/2003:07:33:50 -0400] "GET / HTTP/1.0" 200 8399
64.68.88.32 - - [15/Jul/2003:07:35:12 -0400] "GET /page5.htm HTTP/1.0" 200 37643
64.68.88.30 - - [15/Jul/2003:12:24:06 -0400] "GET /page6.htm HTTP/1.0" 200 8991
64.68.88.158 - - [15/Jul/2003:13:17:21 -0400] "GET /page7.htm HTTP/1.0" 200 10047
64.68.88.28 - - [15/Jul/2003:15:21:58 -0400] "GET /page3.htm HTTP/1.0" 200 11759
64.68.88.22 - - [15/Jul/2003:17:03:49 -0400] "GET /page8.htm HTTP/1.0" 200 12006
64.68.88.163 - - [15/Jul/2003:17:16:38 -0400] "GET /page9.htm HTTP/1.0" 200 11679
64.68.88.29 - - [15/Jul/2003:17:30:51 -0400] "GET /page1.htm HTTP/1.0" 200 10970
64.68.88.21 - - [15/Jul/2003:17:44:45 -0400] "GET /page2.htm HTTP/1.0" 200 10512
64.68.88.165 - - [15/Jul/2003:17:48:16 -0400] "GET /page4.htm HTTP/1.0" 200 7808
64.68.88.10 - - [15/Jul/2003:18:37:43 -0400] "GET /page5.htm HTTP/1.0" 200 37643
64.68.88.41 - - [15/Jul/2003:20:08:48 -0400] "GET /page7.htm HTTP/1.0" 200 10047
64.68.88.138 - - [15/Jul/2003:21:30:16 -0400] "GET /page3.htm HTTP/1.0" 200 11759

As you can see GoogleBot is requesting the same pages multiple times almost in sequence. Additionally, it is showing a 200 response code although there have been no changes to those pages and the server does follow the if_modified_since stuff.
First time I noticed that the 64.68.88 block was doing most of the real searching while the 64.68.82 guys were just hitting index and robots

AthlonInside

9:10 am on Jul 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



They are reaching the non-www version of your site.

So there are double request for each file.

Powdork

4:31 pm on Jul 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is there a way to tell this? For instance, how do I tell if they are reaching a non-www version versus a misspelled version that points at the standard version.

Powdork

7:19 am on Jul 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just to bring the original weird bot behaviour thread back to the forefront.
Who would this guy be? 64.68.68.101
Is someone at the plex gettin' hitched?

AthlonInside

12:52 pm on Jul 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



64.68.68.101 never appear in your log. :)

I have my 301 redirect working properly in my site, so my non-www domain is not index (it is once index when I do not have any 301 redirect).

By the log itself, you can't tell if it is www or non-www that the bots is reaching. If you really want to know, you might need to run a script (maybe with your index page) and detect user agent. If googlebot found, read the $_SERVER["HTTP_HOST"] variable and store it for reference. It could tell the story.

Powdork

6:53 pm on Jul 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



64.68.68.101 never appear in your log

Oh but it did
64.68.68.101 - - [16/Jul/2003:19:34:30 -0400] "GET / HTTP/1.1" 200 8439 "http://www.someonearoundthebayarea.com/weddingstuff.htm" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.0.3705)"

Funny, I would have expected a different OS and browser, unless they are spoofed. Visited four pages.

AthlonInside

6:29 am on Jul 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Powdork,

It is your competitors spying on your to see if you are 'cloaking'. Quite normal for me, I have it very often from a guy using AT&T. He seems so interested in my site!

The IP is specify is an ISP IPs in the ASIA zone. And has nothing to do with GooglePlex.

Powdork

6:59 am on Jul 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



From the arin whois

OrgName: Cable & Wireless
OrgID: EXCW
Address: 3300 Regency Pkwy
City: Cary
StateProv: NC
PostalCode: 27511
Country: US

ReferralServer: rwhois://rwhois.exodus.net:4321/

NetRange: 64.68.64.0 - 64.68.95.255

Where are you coming up with it being an asian IP?