homepage Welcome to WebmasterWorld Guest from 50.17.176.149
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Google Test-Bot: Google-Test2
doc_z




msg:4631785
 11:36 am on Dec 18, 2013 (gmt 0)

I saw a new bot from Google with the name Google-Test2. From the IP 72.14.199.91 one can see that this is indeed from Google.

The bot doesn't respect the rules of the robots.txt file which allows access for GoogleBot but not for this test bot.

The URLs are really strange:
http://www.example.vom/page.html%3C/web:Url%3E%3Cweb:DisplayUrl%3Ewww. [...]

These are url-encoded versions of a XML file of the form


<web:Url>
<web:DisplayUrl>www.example.vom/page.html</web:DisplayUrl>
<web:DateTime>2012-01-12T01:54:00Z<web:DateTime>
[...]


Anyone else is seeing this bot or having an idea what it is good for?

 

doc_z




msg:4631830
 3:14 pm on Dec 18, 2013 (gmt 0)

I forgot to mention that the HTTP referer
http://www.google.co.uk/ seems to to a fake.
wilderness




msg:4631846
 3:48 pm on Dec 18, 2013 (gmt 0)

Here's a 2010 thread [webmasterworld.com] with a resolution in the last submission by Jim.

However and considering the recent insight provided by dstiles, you could reduce

RewriteCond %{REMOTE_ADDR} ^66\.249\.(6[4-9]|7[0-9]|8[0-46-9]|9[0-5])\. [OR]

to

RewriteCond %{REMOTE_ADDR} ^66\.249\.(6[4-9]|7[0-9])\. [OR]

doc_z




msg:4631864
 4:40 pm on Dec 18, 2013 (gmt 0)

I'm not sure if I got your point...

I don't wat to block this bot (which can be easily done by blocking the user-agent 'Google-Test2').

If have several fake GoogleBots which I'm already blocking. However, this is a real Google bot because the IP 72.14.199.91 belongs to Google. The same user-agent can be found on other logfiles [google.com] and similar problems with this kind of URLs [productforums.google.com] can be found.

I'm just curious about what is causing these URLs and what this bot is good for.

wilderness




msg:4631868
 4:49 pm on Dec 18, 2013 (gmt 0)

72.14.204.136 - - [27/Jul/2008:09:27:15 -0500] "GET /MyFolder/MySub/Sub-Sub/MyPage.html HTTP/1.0" 403 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Google Wireless Transcoder;)"
72.14.204.136 - - [31/Jul/2008:18:32:49 -0500] "GET /MyFolder/MySub/MyPage.html HTTP/1.0" 403 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Google Wireless Transcoder;)"


Web Accerelator 2007 [webmasterworld.com]

72.14.194.27 - - [21/Oct/2006:11:27:09 -0700] "GET / HTTP/1.1" 403 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)"

wilderness




msg:4631870
 4:52 pm on Dec 18, 2013 (gmt 0)

FWIW, there are three currently active Google threads.

Please review "Google is that You?"

dstiles




msg:4631945
 10:40 pm on Dec 18, 2013 (gmt 0)

doc_z - you are missing something there. The IP 72.14.199.91 is NOT a google bot IP. It's in a bannable range (see my comment in another current G thread). Its precise rDNS entry cl;aims it is a rate-limited proxy, which means it may well not be G at all but some criminal using their proxy.

"Real" googlebots ONLY come from IPs labelled in DNS as crawler bots.

Block: 72.14.192.0 - 72.14.255.255

lucy24




msg:4631990
 1:09 am on Dec 19, 2013 (gmt 0)

You may need to poke holes though.

72.14.199 includes Site Verification (for wmt)
72.14.229 includes humans investigating dmca claims

doc_z




msg:4632195
 4:03 pm on Dec 19, 2013 (gmt 0)

Okay, I got it. I thought it's Google because the IP is from Google.

Btw, I saw the same bot with the IP 209.85.238.208.

wilderness




msg:4632232
 6:31 pm on Dec 19, 2013 (gmt 0)

You nay add 209.85.128.0/17 to your denies as well.

lucy24




msg:4632253
 8:28 pm on Dec 19, 2013 (gmt 0)

because the IP is from Google

"belongs to {name}" != "{name} indexing robot"

Some search engines are better than others at preserving the distinction.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved