homepage Welcome to WebmasterWorld Guest from 54.234.141.47
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Jakarta Commons-HttpClient
from Google App Engine
Umbra

10+ Year Member



 
Msg#: 3999875 posted 12:54 pm on Oct 2, 2009 (gmt 0)

74.125.46.* and 216.239.50.*
yw-out-*.google.com and kc-out-*.google.com
Jakarta Commons-HttpClient/3.1
GET /googlehostedservice.html

I found Jakarta under "Included Software and Licenses for the Java Language Version of App Engine" here:
[code.google.com...]

Why are they looking for googlehostedservice.html? And who is "they"? Is this Google using their own App Engine, or a 3rd party hosted on Google's cloud computing?

Between all the crap from Google now, how can we differentiate and verify Googlebot, Google Adsbot, Google stealth checks, Google manual site reviews, Google employees just browsing, Google Wireless Transcoder, translate.google.com, Google Keyword Tool and Google-Sitemaps -- some of which all use the same IP addresses?

 

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3999875 posted 6:00 am on Oct 26, 2009 (gmt 0)

Looks like it's regarding the Google Apps engine and here's a specific document relating to the googlehostedservice.html file:
[google.com...]

However, it's really sloppy programming on Google's part not to identify the user agent so we can make some actual sense of what it's supposed to be doing.

That would get blocked on my server and I would simply stop using Google Apps opposed to letting all the default Jakarta user agents run amok on my server.

tangor

WebmasterWorld Senior Member tangor us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3999875 posted 3:43 am on Oct 27, 2009 (gmt 0)

For several years I've had Jakarta* in 403. Will I have to rethink this?

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3999875 posted 4:44 am on Oct 27, 2009 (gmt 0)

I 403 Jakarta w/ 12 IPs currently whitelisted.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3999875 posted 1:55 am on Nov 17, 2009 (gmt 0)

Speaking of 74.125.46.* a.k.a. Google... Since yesterday -- related?

74.125.46.81
Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3
robots.txt? NO
referer: None

74.125.46.82
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
robots.txt? NO
referer: http://www.google.com/search?hl=en&q=www.mysitename.com+filename

(The ref's filename was incomplete both as to title and suffix.)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved