homepage Welcome to WebmasterWorld Guest from 54.227.171.163
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Google Web Accelerator Cache Warmer
browser enhancement possibly using proxies, varied IP
privacyman

10+ Year Member



 
Msg#: 3206634 posted 9:20 am on Jan 3, 2007 (gmt 0)

Having searched for the "Google Web Accelerator Cache Warmer" discovered a post from June 2006.

I had made enquiries about loss of my site in G results on a Google forum.

I "did" stipulate that for my domain that I block many countries (nearly all of APNIC, most of Europe, etc via htaccess file). This was done because too much bad stuff originated from those sources and unfortunate to "innocent parties" that they are caught in the block. But to cure site access log file spamming, harvesting, and many other bad practices, I finally chose this "major effect" option. Don't really care if ASIA and most of Europe cannot access site as it is a US site with not much "general interest" to/for Asia and Europe. And for a friends site (small sales site) same htaccess is used.... they don't do sales out of US.

Back to the point. I mentioned the APNIC and most of Europe being blocked ahead of time in my post to the G groups forum. Turned out "more than one" persons who tried to assist were based in the UK. Of course they were blocked. They also seemingly could not understand why "they" or "innocent parties" would be blocked, well its the old scenario, one bad apple can spoil the bunch.

Checking my log at "about" the time of access for one "forum assistant's" access to my site of course I could see their IP at 81.x.x.x (of course blocked and 403 code) but I also found Google Web Accelerator Cache Warmer" as one access that came through ok. Here is just one log line for same.

65.124.120.nnn - - [02/Jan/2007:10:50:14 -0800] "GET /buttons/select01.gif HTTP/1.1" 200 1096 "http://www.--removedmydomain--.com/" "Mozilla (Google Web Accelerator Cache Warmer; Google-TR-4-GT)"

That G forum person said that he was blocked "for images" but could see my code for a page (I presume normal browser) for the IP entries on 81.x.x.x and he was probably seeing "cached page" source code.

But then I saw the "Accelertor" entry and with IP at 65.x.x.x which is for a business in Mass.US ----

I strongly think that such Google browser enhancement may very well be using proxies around the country or around the world. Seems very logical to me. Rather than a visitor "retrieving" your site pages directly from your own site, in this case visitor in the UK, this browser enhancement may check proxies that have cached different sites, if that is the case then if it finds that a UK "cache" has a copy of your site it would serve from there (whether content is new or old, might even be very old), if it finds instead a cache in Mass US has a copy then would serve from there.

Now, from the June 2006 post it did have the user agent showing as
Mozilla (Google Web Accelerator Cache Warmer; Google-TR-1)
with added comment of IP belongs to Road Runner
see the post at [webmasterworld.com...]

I can only suspect that the "ending portion of the UA" such as Google-TR-1
could be an "indicator" of the visitor source (or nearest Google database).
If that is the case, then the ending portion could very well be quite variable from one visitor to another.

The good part about this, if I am correct, it would mean that this "tool" or "accelerator" does in fact use proxies or "other cached site data bases".

It would also mean that it actually obscures the actual visitors IP. Thus it is an "anonymous" browser.

Now, for persons finding this User Agent in their log files one can consider that it is truly a human doing the browsing (but you won't know from where) so you can allow it as legit, or as I "might decide to do" would be to monitor the various cache IP's to determine where a whole bunch of NON-SE Caches are located and then watch those IP's for their bot activity. If these "non-SE" caches have too much old stuff, then I may as a result just block those from future access by their IP group. In my opinion I don't want lots of "copies" of my site sitting around on independent caches when such cached pages might be old or very old. Plus I don't need or want those extra bots.

I would sooner have good site visitors "not hiding" their IP and activity, and I would sooner have up-to-date copies of my site pages on the "normal" SE's of my choice.

Anyone care to give any feedback or confirmation? Is this actually a browser enhancement that works with non-SE-private-caches and such a tool that provides anonymous browsing (hiding the source IP)?

Think I might be correct but not sure.

JDMorgan if you see this and can give one of your most knowledgeable responses then please post for public but also "sticky" me a short "heads up" on it. Thanks.

[edited by: volatilegx at 9:02 pm (utc) on Jan. 3, 2007]
[edit reason] obfuscated ip address [/edit]

 

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3206634 posted 9:35 pm on Jan 3, 2007 (gmt 0)

I'd suggest that you use a deny based upon a single word in the UA (beginning with the first letter of the alphabet) utilizing the Rewrite "CONTAINS".

This would eliminate the need for tracking and/or adding IP ranges.

The 65 range that you provided is Quest range.
ARIN does not provide any subnet ranges for Qwest.
How did you determine the visitor was from Mass? (Perhaps ping or trace?)

The TR-1 in the UA could possibly be some type of handheld device?

This is the first instance I've seen of Google using a defintive UA for its accelerator.
My visits from such have been from the IP range 64.233.172.**
and utilizing the UA of the visitors browser (no mention of Google).
I denied that range as a result.

I've also added another range based of another accelerator, however this one begins with a term that is common to a host of harvesters so creativity really wasn't necessary for the UA.

Another suggestion is to make your pages no-cache utilizing meta tags. This is not an instant solution, however over time you will find it an effective tool for your previously mentioned goals.

Don

privacyman

10+ Year Member



 
Msg#: 3206634 posted 9:04 pm on Jan 14, 2007 (gmt 0)

Thanks Don for your feedback.

I think that I had mistakenly described the UA of "Google Web Accelerator Cache Warmer" as utilizing individual and independent server caches.

More correctly, though, I believe that when a person uses that "addon" or "plugin" that it probably uses servers that G has found to be faster and possibly more direct to reach a site (with or without serving of a cached page to the site visitor).

I did find that after the UA as quoted in the first line above, that they were followed by "; Google-TR-n" where the n would be a digit from 1 to 5 (could also include higher digits).

So far I do not know what the "TR-n" represents, but I do not think that it related to a handheld device. Reasoning; someone in UK was trying to check out my site (of course I had their IP blocked) thus with their normal IE browser they got a 403, then I saw their attempted access to my site with their usage of the "...Accelerator Cache Warmer" and of course the IP at start of those lines showed "not" their own IP but that of a server elsewhere (possibly the IP of the first server that they would be routed through). Yes, with their usage of the accelerator that they were able to access my site.

> The 65 range that you provided is Quest range.
> ARIN does not provide any subnet ranges for Qwest.
> How did you determine the visitor was from Mass? (Perhaps ping or trace?)

In my original post I hid the actual IP, but I did a whois/reverse on the IP and I could see that it was a business in Mass.

Thus from my observations, I did find that where there were several different persons in UK (and elsewhere) that were trying initially to access my site just with their normal browser (normal UA) and they were blocked (I have many IPs in Europe and elsewhere blocked) they got the normal 403 Forbidden with their normal browser, then each of them tried using the accelerator and did get through to my site but their actual IP's were not shown, instead I found the IP's shown were that of what I suspect was the first server that they were routed through. (Thus, in my opinion I feel that the G web accelerator actually provides a means of anonymous browsing.)

> My visits from such have been from the IP range 64.233.172.**
> and utilizing the UA of the visitors browser (no mention of Google).
> I denied that range as a result.

Don, you mentioned that you found visits with the accelerator to your site had the IP range above. For mine I had found one at 64.233.173.**
I just double checked and I found that G "does" have that range, actually cidr of 64.233.160.0/19 so if you denied it, then you may be blocking Google humans.

Thanks for the suggestion that I make my pages no-cache.

I think, though, where I may have determined that the IP's shown for access via the accelerator may very well be just the IP of the first server that a visitor may be accessing "through" (eg my opinion that it is not the visitors actual IP) that I may just block the UA of "Google Web Accelerator Cache Warmer". IMHO if it does not show the visitor's IP but instead shows the IP of the first server accessed through that it in effect defeats my blocking of certain areas and allows anonymous browsing.

I'd appreciate other opinions, or if others have more info.

Thanks.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3206634 posted 10:05 pm on Jan 14, 2007 (gmt 0)

privacyman,
First to make things easier (rather than both of us takling each other in mysterious circles) it's permissible in this forum to use full-log lines.
We are however required to obfuscate the Class D IP range (asteriks or any other character will suffice), and any references to URL or pages of the URL on our own websites need changing to example names.

Don, you mentioned that you found visits with the accelerator to your site had the IP range above. For mine I had found one at 64.233.173.**
I just double checked and I found that G "does" have that range, actually cidr of 64.233.160.0/19 so if you denied it, then you may be blocking Google humans.

Last I checked?
Google had not become an internet provider, nor was it selling subnet ranges to others that were using the ranges as their internet provider?
In the event that a employee from Google has an interest in the materials on my web pages, than they should visit from their private IP range as opposed to being on compant time and utilizing Google ranges.
In the event an employee at Google has a need to visit my pages and/or my materials without identifying themselves properly, than they are DEMANDING denial.

I had regular visits from the following Class C for some time that I was monitoring.

64.233.173.AA - - [14/Apr/2006:08:36:29 -0700] "GET /MyFolder/MyPages.html HTTP/1.1" 200 18669 "Mysite/SameFolder/DifferentPage.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; Media Center PC 2.8)"

Mysite/SameFolder/DifferentPage.html!
Please note; this refer was a different session and likely a different browser from an entirely different IP.

The most recent Google Translator visit that I have (64.233.182.AAA) utilzes a range from the same range, however that visit is identified as the translator in the UA.

I've had the gooogle translator (as well as the majority of non-North American IP ranges) denied access from my sites for some time.
Now, should I not desire a vistor from France (as an example) to view my pages, why in sam's hell would I allow the same vistitor to simply use the Google translator to circumvent my denial?

In my original post I hid the actual IP, but I did a whois/reverse on the IP and I could see that it was a business in Mass.

original post reference
65.124.120.nnn - - [02/Jan/2007:10:50:14 -0800] "GET /buttons/select01.gif HTTP/1.1" 200 1096 "http://www.--removedmydomain--.com/" "Mozilla (Google Web Accelerator Cache Warmer; Google-TR-4-GT)"

I had the 124-125 Class B or Qwest denied for a very long period.
Your obfuscated Class D of "NNN" allowed me poke and prod and narrow the range down. The Mass reference was a big help.

Thus from my observations, I did find that where there were several different persons in UK (and elsewhere) that were trying initially to access my site just with their normal browser (normal UA) and they were blocked (I have many IPs in Europe and elsewhere blocked) they got the normal 403 Forbidden with their normal browser, then each of them tried using the accelerator and did get through to my site but their actual IP's were not shown, instead I found the IP's shown were that of what I suspect was the first server that they were routed through. (Thus, in my opinion I feel that the G web accelerator actually provides a means of anonymous browsing.)

Should a visitor come to my sites from an IP range that I have denied and recieve a denial and then immediately (or within a short period) return from another IP range seeking the same materials?
It doesn't much matter to me personally if that IP range belongs to Google or a colocator, or a proxy or any other source, the end result is that in most instances the second IP range is either going to be added to my denials or my snoop/watch list.

Don

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3206634 posted 10:20 pm on Jan 14, 2007 (gmt 0)

BTW,
And for newcomers to this forum.

Most of the above references (at least in regard to assumption and action) were NOT learned in a few seconds by reading published materials of suggestions on other websites or copy and pasting entire lines that others have submitted.

Most all of my use of htaccess was initiated with a long range plan.
Learning the habits of my visitors (along with the structure and subjects of my pages) over a very long period (more than seven years) is not something that may be taught or conveyed in seconds.

I would suggest that newcomers to htaccess always deny visitors on the short-end and with the thought that your goal is NOT to inconvenience the majority of users and/or potential visitors.
Taking great care to determine what is both benefical and detrimental to your own website (s).

Don

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved