Welcome to WebmasterWorld Guest from 54.159.214.27

Forum Moderators: Ocean10000 & incrediBILL

gsa-crawler

Whois = Google?

   
7:43 am on Jun 29, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Any ideas?

Saw this in our log files:

8.6.48.249 [28/Jun/2006:22:28:14 -0400] "GET /robots.txt HTTP/1.0" 200 2690 "-" "gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA; me@mycompany.com)"

8.6.48.249 - - [28/Jun/2006:22:28:14 -0400] "GET / HTTP/1.0" 200 5349 "-" "gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA; me@mycompany.com)"

Whois: [ws.arin.net]
Level 3 Communications, Inc. LVLT-ORG-8-8 (NET-8-0-0-0-1)
8.0.0.0 - 8.255.255.255
Google Incorporated LVLT-GOOGL-1-8-6-48 (NET-8-6-48-0-1)
8.6.48.0 - 8.6.55.255

NET -8-6-48-0-1: [ws.arin.net]
OrgName: Google Incorporated
OrgID: GOOGL-1
Address: Google Information Technology
Address: 1600 Amphitheatre Parkway
City: Mountain View
StateProv: CA
PostalCode: 94043
Country: US

NetRange: 8.6.48.0 - 8.6.55.255
CIDR: 8.6.48.0/21
NetName: LVLT-GOOGL-1-8-6-48
NetHandle: NET-8-6-48-0-1
Parent: NET-8-0-0-0-1
NetType: Reassigned
Comment:
RegDate: 2006-05-16
Updated: 2006-05-16

Tried the IP:
[8.6.48.249...]
Search: •public content •public and secure content

6:17 pm on Jun 29, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



GSA = Google Search Appliance
6:27 pm on Jun 29, 2006 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I get pinged by those Google appliances ever ynow and then:

212.35.100.194 "gsa-crawler (Enterprise; GIX-02057; dm@enhesa.com)"
216.15.186.50 "gsa-crawler (Enterprise; MID-02848; support@throttlenet.com)"
204.95.150.205 "gsa-crawler (Enterprise; GIX-04642; bsd@checkfree.com)"
8.6.48.249 "gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA; me@mycompany.com)"

Another similar product that touches my site is called "FAST Enterprise Crawler" by some other company.

8:38 pm on Jul 1, 2006 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Apparently the exact same "S4-E9LJ2B82FJJAA" box has been a busy little bot-runner of late:

8.6.48.249 - - [28/Jun/2006:19:30:11 -0700] "GET /robots.txt HTTP/1.0" 403 803 "-" "gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA; me@mycompany.com)"
8.6.48.249 - - [28/Jun/2006:19:30:11 -0700] "GET / HTTP/1.0" 403 803 "-" "gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA; me@mycompany.com)"
8.6.48.249 - - [28/Jun/2006:20:07:15 -0700] "GET /robots.txt HTTP/1.0" 403 803 "-" "gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA; me@mycompany.com)"
8.6.48.249 - - [28/Jun/2006:20:07:15 -0700] "GET / HTTP/1.0" 403 803 "-" "gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA; me@mycompany.com)"
8.6.48.249 - - [30/Jun/2006:17:16:13 -0700] "GET /robots.txt HTTP/1.0" 403 803 "-" "gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA; me@mycompany.com)"
8.6.48.249 - - [30/Jun/2006:17:16:14 -0700] "GET / HTTP/1.0" 403 803 "-" "gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA; me@mycompany.com)"
8.6.48.249 - - [30/Jun/2006:17:30:47 -0700] "GET /robots.txt HTTP/1.0" 403 803 "-" "gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA; me@mycompany.com)"
8.6.48.249 - - [30/Jun/2006:17:30:47 -0700] "GET / HTTP/1.0" 403 803 "-" "gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA; me@mycompany.com)"

I show this additional info from DNS stuff:

IP address: 8.6.48.249
Reverse DNS: [No reverse DNS entry per dnsauth1.sys.gtei.net.]
Reverse DNS authenticity: [Unknown]
ASN: 36492
ASN Name: GOOGLEWIFI

(GOOGLEWIFI? Hmm. gtei.net? That's "Verizon Trademark Services LLC." Hmm.)

I realize G sells various "Enterprise Solutions [google.com]" appliances, and/or licenses same with fees per number of searches.

But "S4-E9LJ2B82FJJAA" irks me because someone/something Google-related is basically cloaked, and suddenly and repeatedly making the rounds on my main site, and using a dummy "me@mycompany.com" placeholder (1st cousin to anonymous@).

Hrrmph.

.
FYI:

The Google Search Appliance
[google.com...]

Stanford University > IT Services > Google Search Appliance
[stanford.edu...]

Google Search Appliance Review
[searchtools.com...]

.
P.S./Imho

FAST is awful, awful, awful. We've even had to go in and apply firewall rules against them because its various runners have been relentless.

FAST-WebCrawler/2.2.5 - Lycos/Alltheweb/Fast
[webmasterworld.com...]

.
P.P.S./OT

Interesting how G is seeding/targeting Universities [google.com]. (Found while researching GSA stuff.)

10:38 pm on Jul 1, 2006 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



But "S4-E9LJ2B82FJJAA" irks me because someone/something Google-related is basically cloaked, and suddenly and repeatedly making the rounds on my main site, and using a dummy "me@mycompany.com" placeholder (1st cousin to anonymous@).

It's not cloaked, it's actually Google:

OrgName: Google Incorporated
OrgID: GOOGL-1
Address: Google Information Technology
Address: 1600 Amphitheatre Parkway
City: Mountain View
StateProv: CA
PostalCode: 94043
Country: US
NetRange: 8.6.48.0 - 8.6.55.255

It's possible the GSA-crawler from 8.6.48.249 is just Google testing their own dogfood, but it's hard to say as perhaps they're hosting one for someone. Just not sure why it's crawling the web as I thought the GSA was supposed to be for local enterprise web search, not indexing the universe.

11:21 pm on Jul 1, 2006 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Agreed, it's not cloaked in the hidden-connection manner because we all tracked it down right away. Thing is, we had to track it down. It's cloaked in that there's no "Google" in the string, neither is its UA nor its IP one of the umpteen Usual, and obvious, Suspects.

After Google's Transcoder thing, and its Web Accelerator thing, and its Desktop and Toolbar things and people using those/G's IPs as hide-behind proxies, and now/again its Search Appliance thing, and these and other "Google"-obvious UAs/IPS --

User-agent: FeedFetcher-Google
User-agent: FeedFetcher-Google;
User-agent: Mediapartners-Google*
User-agent: Mediapartners-Google/2.1
User-agent: Google WAP Proxy
User-agent: Googlebot
User-agent: Googlebot-Image
User-agent: Googlebot-Mobile
User-agent: googlebot-urlconsole

crawl-[G-IP-numbers-here].googlebot.com
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

-- well, if it doesn't say G(oogle-something), and isn't from G(ooglebot.com), it's a G(oner).

 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month