homepage Welcome to WebmasterWorld Guest from 54.167.174.90
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Google Test-Bot: Google-Test2
doc_z

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4631786 posted 11:36 am on Dec 18, 2013 (gmt 0)

I saw a new bot from Google with the name Google-Test2. From the IP 72.14.199.91 one can see that this is indeed from Google.

The bot doesn't respect the rules of the robots.txt file which allows access for GoogleBot but not for this test bot.

The URLs are really strange:
http://www.example.vom/page.html%3C/web:Url%3E%3Cweb:DisplayUrl%3Ewww. [...]

These are url-encoded versions of a XML file of the form


<web:Url>
<web:DisplayUrl>www.example.vom/page.html</web:DisplayUrl>
<web:DateTime>2012-01-12T01:54:00Z<web:DateTime>
[...]


Anyone else is seeing this bot or having an idea what it is good for?

 

doc_z

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4631786 posted 3:14 pm on Dec 18, 2013 (gmt 0)

I forgot to mention that the HTTP referer
http://www.google.co.uk/ seems to to a fake.
wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4631786 posted 3:48 pm on Dec 18, 2013 (gmt 0)

Here's a 2010 thread [webmasterworld.com] with a resolution in the last submission by Jim.

However and considering the recent insight provided by dstiles, you could reduce

RewriteCond %{REMOTE_ADDR} ^66\.249\.(6[4-9]|7[0-9]|8[0-46-9]|9[0-5])\. [OR]

to

RewriteCond %{REMOTE_ADDR} ^66\.249\.(6[4-9]|7[0-9])\. [OR]

doc_z

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4631786 posted 4:40 pm on Dec 18, 2013 (gmt 0)

I'm not sure if I got your point...

I don't wat to block this bot (which can be easily done by blocking the user-agent 'Google-Test2').

If have several fake GoogleBots which I'm already blocking. However, this is a real Google bot because the IP 72.14.199.91 belongs to Google. The same user-agent can be found on other logfiles [google.com] and similar problems with this kind of URLs [productforums.google.com] can be found.

I'm just curious about what is causing these URLs and what this bot is good for.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4631786 posted 4:49 pm on Dec 18, 2013 (gmt 0)

72.14.204.136 - - [27/Jul/2008:09:27:15 -0500] "GET /MyFolder/MySub/Sub-Sub/MyPage.html HTTP/1.0" 403 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Google Wireless Transcoder;)"
72.14.204.136 - - [31/Jul/2008:18:32:49 -0500] "GET /MyFolder/MySub/MyPage.html HTTP/1.0" 403 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Google Wireless Transcoder;)"


Web Accerelator 2007 [webmasterworld.com]

72.14.194.27 - - [21/Oct/2006:11:27:09 -0700] "GET / HTTP/1.1" 403 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)"

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4631786 posted 4:52 pm on Dec 18, 2013 (gmt 0)

FWIW, there are three currently active Google threads.

Please review "Google is that You?"

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4631786 posted 10:40 pm on Dec 18, 2013 (gmt 0)

doc_z - you are missing something there. The IP 72.14.199.91 is NOT a google bot IP. It's in a bannable range (see my comment in another current G thread). Its precise rDNS entry cl;aims it is a rate-limited proxy, which means it may well not be G at all but some criminal using their proxy.

"Real" googlebots ONLY come from IPs labelled in DNS as crawler bots.

Block: 72.14.192.0 - 72.14.255.255

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4631786 posted 1:09 am on Dec 19, 2013 (gmt 0)

You may need to poke holes though.

72.14.199 includes Site Verification (for wmt)
72.14.229 includes humans investigating dmca claims

doc_z

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4631786 posted 4:03 pm on Dec 19, 2013 (gmt 0)

Okay, I got it. I thought it's Google because the IP is from Google.

Btw, I saw the same bot with the IP 209.85.238.208.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4631786 posted 6:31 pm on Dec 19, 2013 (gmt 0)

You nay add 209.85.128.0/17 to your denies as well.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4631786 posted 8:28 pm on Dec 19, 2013 (gmt 0)

because the IP is from Google

"belongs to {name}" != "{name} indexing robot"

Some search engines are better than others at preserving the distinction.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved