homepage Welcome to WebmasterWorld Guest from 54.196.168.78
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
gsa-crawler
Whois = Google?
coconutz




msg:397828
 7:43 am on Jun 29, 2006 (gmt 0)

Any ideas?

Saw this in our log files:

8.6.48.249 [28/Jun/2006:22:28:14 -0400] "GET /robots.txt HTTP/1.0" 200 2690 "-" "gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA; me@mycompany.com)"

8.6.48.249 - - [28/Jun/2006:22:28:14 -0400] "GET / HTTP/1.0" 200 5349 "-" "gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA; me@mycompany.com)"

Whois: [ws.arin.net]
Level 3 Communications, Inc. LVLT-ORG-8-8 (NET-8-0-0-0-1)
8.0.0.0 - 8.255.255.255
Google Incorporated LVLT-GOOGL-1-8-6-48 (NET-8-6-48-0-1)
8.6.48.0 - 8.6.55.255

NET -8-6-48-0-1: [ws.arin.net]
OrgName: Google Incorporated
OrgID: GOOGL-1
Address: Google Information Technology
Address: 1600 Amphitheatre Parkway
City: Mountain View
StateProv: CA
PostalCode: 94043
Country: US

NetRange: 8.6.48.0 - 8.6.55.255
CIDR: 8.6.48.0/21
NetName: LVLT-GOOGL-1-8-6-48
NetHandle: NET-8-6-48-0-1
Parent: NET-8-0-0-0-1
NetType: Reassigned
Comment:
RegDate: 2006-05-16
Updated: 2006-05-16

Tried the IP:
[8.6.48.249...]
Search: •public content •public and secure content

 

volatilegx




msg:397829
 6:17 pm on Jun 29, 2006 (gmt 0)

GSA = Google Search Appliance

incrediBILL




msg:397830
 6:27 pm on Jun 29, 2006 (gmt 0)

I get pinged by those Google appliances ever ynow and then:

212.35.100.194 "gsa-crawler (Enterprise; GIX-02057; dm@enhesa.com)"
216.15.186.50 "gsa-crawler (Enterprise; MID-02848; support@throttlenet.com)"
204.95.150.205 "gsa-crawler (Enterprise; GIX-04642; bsd@checkfree.com)"
8.6.48.249 "gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA; me@mycompany.com)"

Another similar product that touches my site is called "FAST Enterprise Crawler" by some other company.

Pfui




msg:397831
 8:38 pm on Jul 1, 2006 (gmt 0)

Apparently the exact same "S4-E9LJ2B82FJJAA" box has been a busy little bot-runner of late:

8.6.48.249 - - [28/Jun/2006:19:30:11 -0700] "GET /robots.txt HTTP/1.0" 403 803 "-" "gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA; me@mycompany.com)"
8.6.48.249 - - [28/Jun/2006:19:30:11 -0700] "GET / HTTP/1.0" 403 803 "-" "gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA; me@mycompany.com)"
8.6.48.249 - - [28/Jun/2006:20:07:15 -0700] "GET /robots.txt HTTP/1.0" 403 803 "-" "gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA; me@mycompany.com)"
8.6.48.249 - - [28/Jun/2006:20:07:15 -0700] "GET / HTTP/1.0" 403 803 "-" "gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA; me@mycompany.com)"
8.6.48.249 - - [30/Jun/2006:17:16:13 -0700] "GET /robots.txt HTTP/1.0" 403 803 "-" "gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA; me@mycompany.com)"
8.6.48.249 - - [30/Jun/2006:17:16:14 -0700] "GET / HTTP/1.0" 403 803 "-" "gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA; me@mycompany.com)"
8.6.48.249 - - [30/Jun/2006:17:30:47 -0700] "GET /robots.txt HTTP/1.0" 403 803 "-" "gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA; me@mycompany.com)"
8.6.48.249 - - [30/Jun/2006:17:30:47 -0700] "GET / HTTP/1.0" 403 803 "-" "gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA; me@mycompany.com)"

I show this additional info from DNS stuff:

IP address: 8.6.48.249
Reverse DNS: [No reverse DNS entry per dnsauth1.sys.gtei.net.]
Reverse DNS authenticity: [Unknown]
ASN: 36492
ASN Name: GOOGLEWIFI

(GOOGLEWIFI? Hmm. gtei.net? That's "Verizon Trademark Services LLC." Hmm.)

I realize G sells various "Enterprise Solutions [google.com]" appliances, and/or licenses same with fees per number of searches.

But "S4-E9LJ2B82FJJAA" irks me because someone/something Google-related is basically cloaked, and suddenly and repeatedly making the rounds on my main site, and using a dummy "me@mycompany.com" placeholder (1st cousin to anonymous@).

Hrrmph.

.
FYI:

The Google Search Appliance
[google.com...]

Stanford University > IT Services > Google Search Appliance
[stanford.edu...]

Google Search Appliance Review
[searchtools.com...]

.
P.S./Imho

FAST is awful, awful, awful. We've even had to go in and apply firewall rules against them because its various runners have been relentless.

FAST-WebCrawler/2.2.5 - Lycos/Alltheweb/Fast
[webmasterworld.com...]

.
P.P.S./OT

Interesting how G is seeding/targeting Universities [google.com]. (Found while researching GSA stuff.)

incrediBILL




msg:397832
 10:38 pm on Jul 1, 2006 (gmt 0)

But "S4-E9LJ2B82FJJAA" irks me because someone/something Google-related is basically cloaked, and suddenly and repeatedly making the rounds on my main site, and using a dummy "me@mycompany.com" placeholder (1st cousin to anonymous@).

It's not cloaked, it's actually Google:

OrgName: Google Incorporated
OrgID: GOOGL-1
Address: Google Information Technology
Address: 1600 Amphitheatre Parkway
City: Mountain View
StateProv: CA
PostalCode: 94043
Country: US
NetRange: 8.6.48.0 - 8.6.55.255

It's possible the GSA-crawler from 8.6.48.249 is just Google testing their own dogfood, but it's hard to say as perhaps they're hosting one for someone. Just not sure why it's crawling the web as I thought the GSA was supposed to be for local enterprise web search, not indexing the universe.

Pfui




msg:397833
 11:21 pm on Jul 1, 2006 (gmt 0)

Agreed, it's not cloaked in the hidden-connection manner because we all tracked it down right away. Thing is, we had to track it down. It's cloaked in that there's no "Google" in the string, neither is its UA nor its IP one of the umpteen Usual, and obvious, Suspects.

After Google's Transcoder thing, and its Web Accelerator thing, and its Desktop and Toolbar things and people using those/G's IPs as hide-behind proxies, and now/again its Search Appliance thing, and these and other "Google"-obvious UAs/IPS --

User-agent: FeedFetcher-Google
User-agent: FeedFetcher-Google;
User-agent: Mediapartners-Google*
User-agent: Mediapartners-Google/2.1
User-agent: Google WAP Proxy
User-agent: Googlebot
User-agent: Googlebot-Image
User-agent: Googlebot-Mobile
User-agent: googlebot-urlconsole

crawl-[G-IP-numbers-here].googlebot.com
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

-- well, if it doesn't say G(oogle-something), and isn't from G(ooglebot.com), it's a G(oner).

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved