homepage Welcome to WebmasterWorld Guest from 54.196.69.189
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Website
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Google IP address simulating browser requests?
Is cloaking sites for Google dead?
bumpski




msg:3070726
 8:06 pm on Sep 3, 2006 (gmt 0)

Since 7/10/06 I'm seeing requests from a Google IP address and one or two other large blocks of IP addresses that I won't identify. These queries in my logs look like "played back" or "repeated" Google search query "clicks" from many of the typically available browsers.

The referrer string looks just like one from a browser that has just clicked on a Google search result, BUT, the request is from a Google IP!

Ah you say someone at Google is manually doing searches and looking at pages, well ..

Even more curious is the identical request, the exact same query, referrer string and all, then comes from another IP address block, not associated with Google, BUT, this is within seconds of the original request. Sometimes multiple copies of the query from other IP address blocks appear within seconds.

This query then proceeds to simulate a browser requesting all the pertinent page content, images, frames, etc, making it look just like Firefox, or Internet Explorer browser requesting the full website page.

These "simulated" browser queries coming from a Google IP address and at the same time a non-Google IP address block(s) have continued in my logs almost every day since 7/10 to this day. Of course Google's normal crawling is proceeding on a daily basis.

These queries have keywords in the referrer string that are pertinent to many pages on this particular website. This is why I believe Google is actually repeating (simulating) previously recorded search queries from past Google visitors, who did searches, and then found this site. Many keywords pertinent to this site show up in the referrer strings.

For clarity these (many) queries do not have a typical Googlebot or Mediabot referrer string, the referrer string is typical of an internet surfer clicking on a Google search result. This type of query should not typically come from a Google IP, and magically be followed by a second and even third or fourth identical request from different IPs.

This type of automation, and willingness to cloak referrer strings, also using IP addresses not affiliated with Google, would definitely defeat all typical cloaking schemes.

Frankly I have no problem with these types of queries to this site, there is no cloaking done here, but many of the pages in question do rank well in the SERPS.

Any thoughts? Have I misunderstood my logs?

 

Bewenched




msg:3070967
 4:37 am on Sep 4, 2006 (gmt 0)

Just curious, but do you use google analytics? I have seen pages that were hit through my analytics tool get hit by google bot either the same day or very shortly after. I think they may be using analytics data to fuel some of the bot activities for some sites.

bumpski




msg:3071039
 7:20 am on Sep 4, 2006 (gmt 0)

I started to use analytics when Google first began support, BUT, it was producing a significant delay in web page presentation, which I found unacceptable at the time, so I removed the Javascript from all pages. I still do have the Analytics reports in my Adwords account, but they show zero page views. I wish I could use these!

Please remember the log entries I describe are not from anything that identifies itself as the "Googlebot", nor MediaPartners bot, nor GoogleBot (Adwords bot) etc. These requests are however from a Google IP address, requests that are then duplicated within seconds from one or two other non-Google IP addresses. Then followed by the remaining typical requests for pictures, frames, etc, that a browser would normally make. These requests all look like they were from various web browsers, in one case a very outdated version of Firefox!

DamonHD




msg:3071069
 8:33 am on Sep 4, 2006 (gmt 0)

Hi,

Are you sure that these are not hits coming thorugh Google's "Web Accelerator" (ie proxy)?

Rgds

Damon

bumpski




msg:3071145
 10:06 am on Sep 4, 2006 (gmt 0)

DamonHD

Yes, that is a good question that I thought about a lot, but the evidence doesn't make sense for an accelerator, but more for a cloaking check.

Why would the same request come through 2, 3, or even 4 sources? Redundancy? Perhaps a path latency determination, but that's pretty agressive use of a websites resources, if it were to be used by the many web users out there.

Would Google go to the trouble to use multiple proxies to achieve this goal? Maybe Google does own the IP's indirectly, but they were major Internet players with large address blocks.

Here's sort of a sample:
IP Google GET /sample-page.htm HTTP/1.1" 200 7945 "http://www.google.com/search?q=term1+term2&hl=en&sourceid=gd" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
IP Player1 GET /sample-page.htm HTTP/1.1" 200 7945 "http://www.google.com/search?q=term1+term2&hl=en&sourceid=gd" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
IP Player2 GET /sample-page.htm HTTP/1.1" 200 7945 "http://www.google.com/search?q=term1+term2&hl=en&sourceid=gd" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
IP Google GET /sample-page1.jpg HTTP/1.1" 200 7945 "http://www.example.com/sample-page.htm" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
IP Google GET /sample-page2.jpg HTTP/1.1" 200 7945 "http://www.example.com/sample-page.htm" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"

etc, etc. Requests all within seconds of each other. Looks just like a browser query except a GET is done to the original content multiple times through multiple source IP's with an identical referrer string. That's the part that is very strange.

It's possible I'm seeing someone at Google using or supporting Tor, but I'd call this a buggy Tor producing the multiple redundant requests.

But this is exactly what one would do to look for cloaking using a fairly thorough technique.
So if a site is cloaking and it disappears from the Google index, the site owner could look back through the logs for this pattern. One method would be to find a very unusual referrer string reflecting a complex Google query and then look for multiple copies of the same referrer string, and check out the IP source addresses.

I just saw a lot of Google search strings from the same IP address and then was very surprised when the "who is" search indicated it was from Google itself. Then I searched for a complex referrer string associated with one of these queries and found other non-Google IP's making the identical request within seconds!

So yes it could be an accelerator, but if it is, it's a wasteful one! Multiple requests through multiple paths and service providers. I do have to look further, I know there is another post here identifying one Google address range used for acceleration. Even more detailed log information would help, but that's not likely in this case.

Alex70




msg:3071149
 10:17 am on Sep 4, 2006 (gmt 0)

>>But this is exactly what one would do to look for cloaking using a fairly thorough technique. << 100% agree.

b2net




msg:3071633
 8:34 pm on Sep 4, 2006 (gmt 0)

Google accelerator ip is 72.14.192.x

Cloaking has a bad reputation for no real reason and Google shouldn't worry about it too much. Cloaking is mainly used to hide a few footer links that the real user doesn't even want to see because they're off-topic. Some people cloak their sitemap pages just because they don't look nice. This has nothing to do with ranking high.

bumpski




msg:3071758
 11:55 pm on Sep 4, 2006 (gmt 0)

One sample Google IP generating these requests is 64.233.173.85 (not in the documented accelerator block), then a duplicate request comes from IP's associated with other major communications service providers, which I won't identify.

Regarding cloaking, HMMM, I don't know how many times I've looked at the Google cache of a site to find it has no correlation whatsoever with the actual site content. Cloaking is abused far more than it is used for useful purposes.

If I were a webmaster that was cloaking, I'd be reviewing my logs very carefully looking for evidence of these redundant, unidentified, requests from Google, and then investigate another means of content stuffing, etc.

I think the party may be over soon! Maybe Google will let some cloaking slide, who knows?

Or it's a bad bug in Google's accelerator and they are using more IP's that have not yet been identified.

Or Google is using the accelerator for double duty, accelerate and also find cloaking. A simple file diff will find out how severe the cloaking is.

goubarev




msg:3071784
 12:39 am on Sep 5, 2006 (gmt 0)

Interesting... Google might as well be testing some kind of cloaking checking mechanism... I wouldn't be surparized...

I have not noticed anything like that on any of my sites - it would be hard to extract that kind of info out of my logs... :c(

BTW, not sure why analytics didn't work for you. It's very fast for me, never had any "slowdowns"... and running really big sites with it (100k uniques a day)...

Jordo needs a drink




msg:3074219
 12:11 am on Sep 7, 2006 (gmt 0)

It's probably Google prefetching. They prefetch from the 64.233 also. They also prefetch on search results. [google.com ]

Last night, 64.233.172.4 was prefetching from my site via search results. I know it was prefetching because of 2 things...

1. It also starting grabbing the links on the target page including my bot trap link.

2. Because it hit my bot trap, I could see it had a "x-moz: prefetch" in the header. [webaccelerator.google.com ]

r3nz0




msg:3074268
 1:36 am on Sep 7, 2006 (gmt 0)

What if you (I) got pages with Hidden Div's and some javascript to save div settings in cookies?

I let users make up their own page , it could look to cloaking for google but in stead of that it is just a richtype Menu..

Hoping this give us no problems in the future..

koan




msg:3074286
 2:13 am on Sep 7, 2006 (gmt 0)

I've seen too many spam sites using cloaking rank well not to be overjoyed with the possibility Google might do something serious against it (yes, it's a serious matter). I just hope they navigate like a regular user because I have defenses against people trying to download the whole site, where I exclude search crawlers using user-agents. If they navigate at the same speed they index, that could block them unintentionally.

bumpski




msg:3076710
 8:15 pm on Sep 8, 2006 (gmt 0)

Yes Jordo

I agree 64.233.172.4 my well be an accelerator IP, but then other IP's, that are not Google IP's make the same request, sometimes in the same second. This pattern is repeated over and over again, a Google IP makes a request then a non-google IP makes the same request!

I've noted a bug in Google's accelerator. It miss parses IFrame tags and tries to "GET /iframe...(the entire IFrame tag content is in the request string.) This seems to be a random failure on random IFrame tags through several of my sites. Sometimes it parses the same IFrame tag on the same page just fine, sometimes it does "GET /iframe ...." so it gets a 404 while the actual browser goes ahead and parses the IFrame tag correctly and fetches the page at the "src=" field.

I've written the "Accelerator team" about this several days ago but haven't received a reply.

I installed the Google accelerator to debug this, but two problems occured, so I'm going to uninstall soon!

1. It's mostly slower! I think the overhead of communicating through a proxy defeats the acceleration. (on a busy PC). They do proxy GET's before they do content GETs etc. I've noted many other bugs mentioned in Webmaster World posts.
2. I CAN'T ACCESS WEBMASTER WORLD! Ohh Nooo, Mr. Billlll!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved