homepage Welcome to WebmasterWorld Guest from 54.197.15.196
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 61 message thread spans 3 pages: < < 61 ( 1 [2] 3 > >     
Google Announces Page PreFetching (again) - beta
johnmoose

5+ Year Member



 
Msg#: 4347171 posted 3:23 pm on Aug 2, 2011 (gmt 0)


System: The following 4 messages were cut out of thread at: http://www.webmasterworld.com/google/4326046.htm [webmasterworld.com] by engine - 3:17 pm on Aug 3, 2011 (utc +1)


< moved from another location >

Now this is going to create fake hits on web servers without any real visitors..

[chrome.blogspot.com ]

Me not so happy...

 

Angonasec

10+ Year Member



 
Msg#: 4347171 posted 11:49 pm on Aug 3, 2011 (gmt 0)

We have used this to block pre-fetching for years, will it also stop Google's latest Chrome intrusion?

RewriteCond %{HTTP:X-moz} ^prefetch
RewriteRule .* - [F,L]

frontpage

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4347171 posted 12:25 am on Aug 4, 2011 (gmt 0)

Google ~ "We Scrape Your Content and Strip your Ads"

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4347171 posted 2:39 am on Aug 4, 2011 (gmt 0)

will [mod_rewrite's prefetch block] also stop Google's latest Chrome intrusion?

Doubt it. G's docs make precise distinctions between prefetching, prerendering, etc., and, of course, say zip about how sites can resist assault/assimilation... (mua-ha-ha)

Sgt_Kickaxe

WebmasterWorld Senior Member sgt_kickaxe us a WebmasterWorld Top Contributor of All Time



 
Msg#: 4347171 posted 6:10 am on Aug 4, 2011 (gmt 0)

via pete freitag
Mozilla browsers support a feature called link prefetching, which allows a web page to tell the browser to prefetch a url if it is idle. Google has been using this technique in their search results, telling Mozilla to start loading the first result. I also noticed that MXNA 2.0 is including 3 prefetch tags.

How to you tell the browser to prefetch a url?

By using the following code:

<link rel="prefetch" href="http://url.to.prefetch/" />

How can I detect prefetching on my web site?

When mozilla does a prefetch, it sends a header X-moz: prefetch, you can then block based on that header. Google recommends sending a 404 back to block the prefetch.

How can I block prefetching?

Using mod_rewrite to send a 404:

RewriteEngine On
SetEnvIf X-moz prefetch HAS_X-moz
RewriteCond %{ENV:HAS_X-moz} prefetch
RewriteRule .* /prefetch-attempt [L]

This will redirect all prefetch-attempts to /prefetch-attempt as long as that file doesn't exist, the client will get a 404.


Question - would this 404 be returned to Google or would the visitor have problems if he/she indeed did click your link ?

You could also block with a 503 Forbidden response:

RewriteEngine On
SetEnvIf X-moz prefetch HAS_X-moz
RewriteCond %{ENV:HAS_X-moz} prefetch
RewriteRule .* [F,L]


Is that a better way?

rlange



 
Msg#: 4347171 posted 1:18 pm on Aug 4, 2011 (gmt 0)

17. How can I opt my website out of Chrome Instant URL loading?

If a Google Chrome user has enabled the "Chrome Instant" feature, most webpages will load as soon as the URL has been typed into the address bar, before the user hits Enter.

If you are a website administrator, you can prevent Google Chrome from exhibiting this behavior on your website:
  • When Google Chrome makes the request to your website server, it will send the following header:
    X-Purpose: instant

  • Detect this, and return an HTTP 403 ("Forbidden") status code.
  • When Google Chrome receives this status code, it will add your website to a blacklist maintained on the client. This blacklist will last the duration of that user’s browsing session.

Source: [google.com...]

--
Ryan

ChanandlerBong

5+ Year Member



 
Msg#: 4347171 posted 2:19 pm on Aug 4, 2011 (gmt 0)

that's all great but the real pertinent question they don't answer:

will blocking pre-fetching have any mid- to long-term effects on ranking?

Because that's the 64 million dollar question. If Google, in their infinite wisdom, decide they only want to play ball with the "fast web", there's nothing stopping them throwing "Site X blocks pre-fetch" into the cauldron with the rest of their algo.

Another small step towards a two-tier web. The big boys and the rest of us.

rlange



 
Msg#: 4347171 posted 3:33 pm on Aug 4, 2011 (gmt 0)

Wow. I'm constantly amazed at how much paranoia injects itself into discussions around here.

--
Ryan

Simsi

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4347171 posted 4:06 pm on Aug 4, 2011 (gmt 0)

Not so much paranoia IMO. I think webmasters are merely more worried about Google's motives since recession bit.

m0thman

5+ Year Member



 
Msg#: 4347171 posted 6:14 pm on Aug 4, 2011 (gmt 0)

Wow. I'm constantly amazed at how much paranoia injects itself into discussions around here.

--
Ryan


+1

Robert Charlton

WebmasterWorld Administrator robert_charlton us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4347171 posted 6:40 pm on Aug 4, 2011 (gmt 0)

Think of it as one more choice:

  • Google Search
  • I'm Feeling Lucky
  • Google Search Is Feeling Lucky

Angonasec

10+ Year Member



 
Msg#: 4347171 posted 6:44 pm on Aug 4, 2011 (gmt 0)

When Google Chrome makes the request to your website server, it will send the following header:

X-Purpose: instant

# Detect this, and return an HTTP 403 ("Forbidden") status code.


Could the Apache Bods kindly confirm that:

RewriteCond %{HTTP:X-moz} ^prefetch
RewriteRule .* - [F,L]

Will therefore block this behaviour in Chrome... if so we will continue to use it without qualms.

rlange



 
Msg#: 4347171 posted 7:14 pm on Aug 4, 2011 (gmt 0)

Angonasec wrote:
Could the Apache Bods kindly confirm that:

RewriteCond %{HTTP:X-moz} ^prefetch
RewriteRule .* - [F,L]

Will therefore block this behaviour in Chrome... if so we will continue to use it without qualms.

As written, it won't. This will:

RewriteCond %{HTTP:X-Purpose} ^instant$
RewriteRule .* - [F]


To block both Firefox and Chrome prefetching:

RewriteCond %{HTTP:X-Moz} ^prefetch$ [OR]
RewriteCond %{HTTP:X-Purpose} ^instant$
RewriteRule .* - [F]


--
Ryan

Angonasec

10+ Year Member



 
Msg#: 4347171 posted 7:38 pm on Aug 4, 2011 (gmt 0)

Blocks installed sitewide. Ta Ryan!

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4347171 posted 9:41 pm on Aug 4, 2011 (gmt 0)

X-Purpose: instant - so, another non-standard header to deal with.

And what about...

"When Google Chrome receives this status code, it will add your website to a blacklist maintained on the client. This blacklist will last the duration of that user’s browsing session."

Looks as if it may be better to detect X-Purpose: (anything) and return a 200 page with text something like, "You are using a stupid service. Drop google and Chrome and use Bing and Firefox instead!"

More damn work just to deal with $(*^ google!

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4347171 posted 10:24 pm on Aug 4, 2011 (gmt 0)

And...

I have now found references to PURPOSE but...

Which does google actually use?

The correct form is HTTP_X_PURPOSE but webkit seems to advocate HTTP_X-PURPOSE (which I've never seen before). Since chrome is webkit I assume they are using that version. Certainly that is the case in the google reference document itself.

I assume the prefix HTTP is correct, or is google also breaking that?

I still can't believe the arrogance of blacklisting a web site based on the results of a possibly illegal prefetch!

And the idea that "most web pages will load as soon as the URL has been typed into the address bar, before the user hits Enter" implies that a) google prefetches a LOT of pages whilst the punter is still typing and b) that all of those web sites are on very fast servers, in which case, why bother prefetching?

Chrispcritters

5+ Year Member



 
Msg#: 4347171 posted 11:03 pm on Aug 4, 2011 (gmt 0)

I believe the context of "blacklist" is that Chrome will not attempt any additional pre-fetch (of that page/domain) during that user's browser session.

Staffa

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4347171 posted 11:20 pm on Aug 4, 2011 (gmt 0)

Looks as if it may be better to detect X-Purpose: (anything) and return a 200 page with text something like, "You are using a stupid service. Drop google and Chrome and use Bing and Firefox instead!"

Already in place :o)

Staffa

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4347171 posted 1:05 am on Aug 5, 2011 (gmt 0)

Does any one know if the G Toolbar, in this case GTB6.6 on MSIE 7.0, pre-fetches also ?

I am trying to figure out a strange occurrence on one of my sites tonight where a visitor comes from a Google search, looks at three pages in the normal way and out of the blue the home page is called without a click from the visitor. Actually the page he was on at the moment has no link to the home page nor was the page called with a referrer URL.
This happened 8 times during a visit lasting 14 minutes.

rlange



 
Msg#: 4347171 posted 5:21 am on Aug 5, 2011 (gmt 0)

Chrispcritters wrote:
I believe the context of "blacklist" is that Chrome will not attempt any additional pre-fetch (of that page/domain) during that user's browser session.

That's exactly what it means. Some people just seem to be suffering from Google Derangement Syndrome...

--
Ryan

AlexK

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4347171 posted 4:41 pm on Aug 6, 2011 (gmt 0)

@Staffa:
Does any one know if the G Toolbar, in this case GTB6.6 on MSIE 7.0, pre-fetches also ? ... out of the blue the home page is called without a click from the visitor

Variations of this have been happening for years on my site. I cannot give detailed forensics (got better things to do, such as adding more content), but this is my best intelligence:

Mine is a php site with mod_rewrite. So, urls become:

www.mydomain.tld/mfc/chipset.families/chipset-name.html

www.mydomain.tld/ : 200
www.mydomain.tld/mfc/ : 200
www.mydomain.tld/mfc/chipset.families/ 404

I get constant 404s for the last variant, preceded by a 200 for the original. My best intelligence is that some browsers (toolbars?) are post-requesting all variants down the line.

Staffa

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4347171 posted 7:22 pm on Aug 6, 2011 (gmt 0)

Thank you AlexK, I guess you are right in that it is a browser/toolbar related event for all the while the visitor was browsing and executing scripts as expected and was, seemingly, unaware of these calls to the home page.
BTW, he never visited the home page which is not unusual.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4347171 posted 10:35 pm on Aug 6, 2011 (gmt 0)

> When Google Chrome receives this status code, it will add your website to a blacklist maintained on the client. This blacklist will last the duration of that user’s browsing session.

Where does it say in there that it's the prefetch only that is blacklisted? Does anyone have proof that I'm reading this wrongly?

Paranoid I may be, but I'm sure I'm not deranged. :)

Another question: who actually prefetches and caches the page - the searcher or google?

If google then they are probably offending against many sites' T&C re: accelerators.

If the searcher then google are in for some heavy criticism, especially of they prefetch for on-the-fly typing. I can see how very easy it would be to "accidentally" cache some undesirable - even trojan - pages.

On the subject of toolbars, there was a discussion a while back (can't recall if here or in the "Search Engine Spider and User Agent Identification" forum) which suggested, with a degree of evidence, that GTB and GoogleToolbar in the UA meant different things. In my own experience GoogleToolbar in a UA means it's automated whereas GTB seems to be a human.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4347171 posted 2:10 am on Aug 7, 2011 (gmt 0)

OK, here's a concrete head-scratcher. Is this pre-fetching, caching or something else entirely? Post-fetching maybe? From yesterday's logs, deliberately including a few before-and-after lines to establish context. All IPs are the same, and likewise the UA.

nnn.nn.nn.nnn - - [05/Aug/2011:21:04:26 -0700] "GET /paintings/myrats/dreams.html HTTP/1.1" 200 1071 "http://www.example.com/paintings/myrats.html" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.107 Safari/535.1"

nnn.nn.nn.nnn - - [05/Aug/2011:21:04:27 -0700] "GET /paintings/myrats/blowups/largedreams.jpg HTTP/1.1" 200 51821 "http://www.example.com/paintings/myrats/dreams.html" "{Chrome}"

nnn.nn.nn.nnn - - [05/Aug/2011:21:05:41 -0700] "GET /paintings/myrats/blowups/largecatfight.jpg HTTP/1.1" 200 2770 "http://webcache.googleusercontent.com/search?q=cache:http://www.example.com/paintings/myrats/catfight.html" "{Chrome}"

nnn.nn.nn.nnn - - [05/Aug/2011:21:05:57 -0700] "GET /paintings/myrats/catfight.html HTTP/1.1" 200 1098 "http://www.example.com/paintings/myrats.html" "{Chrome}"

nnn.nn.nn.nnn - - [05/Aug/2011:21:06:14 -0700] "GET /paintings/myrats/mychip.html HTTP/1.1" 200 957 "http://www.example.com/paintings/myrats.html" "{Chrome}"

nnn.nn.nn.nnn - - [05/Aug/2011:21:06:15 -0700] "GET /paintings/myrats/blowups/largemychip.jpg HTTP/1.1" 200 53467 "http://www.example.com/paintings/myrats/mychip.html" "{Chrome}"


Where did item #3 come from? My logs occasionally get a little wonky about timestamps, but sixteen seconds in the wrong direction? Nuh-uh. Did g### take the opportunity to say "Oh, whoops, I don't have that one cached so let me grab it while I'm here"? Why didn't they do the same with the immediately preceding image, which was last visited in the same Imagebot session?

Thanks to outside circumstances (the IP plus the original referrer) I may actually be able to find out who the visitor was. Wonder if I'd learn anything?

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4347171 posted 4:03 pm on Aug 7, 2011 (gmt 0)

Thanks for the extra, Lucy.

I've recently seen googleusercontent blocked for infringement in my logs but didn't follow up on it.

Another question I have for anyone who logs such things: what are the other request headers like for the "pre-fetch"? Are they as one would expect from a browser or more like a bot? The difference is important: if it's not browser-like then the "accelerated" page as delivered could be faulty (eg what language should we deliver?).

rlange



 
Msg#: 4347171 posted 1:33 pm on Aug 8, 2011 (gmt 0)

dstiles wrote:
Where does it say in there that it's the prefetch only that is blacklisted? Does anyone have proof that I'm reading this wrongly?

Proof? No. Evidence? Yes. It's called context [dictionary.reference.com].

Edit: It would also be easy enough for you to test your claim in an attempt to prove yourself correct, which is how it's supposed to work. It's not up to anyone to prove you wrong; it's up to you to prove yourself correct.

Paranoid I may be, but I'm sure I'm not deranged. :)

You're making up additional reasons to hate this feature. That's on the wrong side of the deranged/not deranged scale.

Another question: who actually prefetches and caches the page - the searcher or google?

The browser, therefore the searcher.

If the searcher then google are in for some heavy criticism, especially of they prefetch for on-the-fly typing. I can see how very easy it would be to "accidentally" cache some undesirable - even trojan - pages.

It's reasonable to expect that malicious websites are subject to the same blocking/warning during prefetch as when viewed normally, so known malicious websites may not even be prefetched. As for unknown malicious websites, I suspect that various features of the browser—like file downloads—are not available to prefetched pages. That would be worth testing, though, just to be sure.

--
Ryan

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4347171 posted 7:35 pm on Aug 8, 2011 (gmt 0)

rlange - I maintain that the "blacklist" text is at best ambiguous but your interpretation is as good as anyone's.

I will not run Chrome because a) I do not have a use for it and it would be onerous to install and maintain; and b) it has too many potentially exploitable features. I know several people here ignore the latter (presumably taking appropriate care): hence my question.

There are a lot of mailicious sites still on google's SERPS. If google can't block them from SERPS why should we trust that their prefetch is safe?

However, my actual point was "undesirable", trojan being an extreme and dangerous example. For "undesirable" read anything you do not want your children to see, anything you do not want an investigating police officer to find on your computer, etc

Zk178

5+ Year Member



 
Msg#: 4347171 posted 5:27 am on Aug 9, 2011 (gmt 0)

It seems that the X-Purpose header is sent for "Google Instant" requests, not for "Google Instant Pages" requests. There is even an unresolved bug on this submitted in the Chromium project - [code.google.com...]

So it seems that the only way to distinguish prefetch requests is to use Page Visibility API in client-side scripts.

rlange



 
Msg#: 4347171 posted 1:43 pm on Aug 9, 2011 (gmt 0)

Zk178 wrote:
It seems that the X-Purpose header is sent for "Google Instant" requests, not for "Google Instant Pages" requests. There is even an unresolved bug on this submitted in the Chromium project - [webmasterworld.com...]

OK, now I'm confused. What is it about Google Instant that needs to make a request to a website? I was under the impression that it was just an "instant search results" feature.

But I think you're right; that information I linked to above is for Google Instant, not Google Instant Pages (curse Google and their naming nonsense).

From the issue you linked to...

At this point we're still gathering developer feedback about prerendering (including this need) and haven't made a decision.

That was nearly 2 months ago now. They must have decided on something if they've made the feature available and on-by-default in a stable release.

--
Ryan

rlange



 
Msg#: 4347171 posted 2:08 pm on Aug 9, 2011 (gmt 0)

Some more information on prefetching/prerendering: [code.google.com...]

In the "Detecting when your site is being prerendered" section their only suggestion is the Page Visibility API. There is this interesting bit, though:

If your site includes a third-party script for analytics or advertising, in many cases you shouldn't have to make any modifications to your site—the third party will simply modify the script they provide slightly to make use of the Page Visibility API. You should contact the third party directly to see if their scripts are prerender-aware.

The Google Analytics tracking code was updated [code.google.com] back in July to postpone tracking for prerendered pages until the page is actually viewed by a user. Who knows about other JavaScript-based analytics, though? This also, obviously, does nothing for bandwidth and log-based analytics.

Something else...

Situations in which prerendering is aborted

In some cases while prerendering a site Chrome may run into a situation that could potentially lead to user-visible behavior that is incorrect. In those cases, the prerender will be silently aborted. Some of these cases include:

Note: This is not an exhaustive list. Last updated 6/13/11.
  • The URL initiates a download
  • HTMLAudio or Video in the page
  • POST, PUT, and DELETE XMLHTTPRequests
  • HTTP Authentication
  • HTTPS pages
  • Pages that trigger the malware warning
  • Popup/window creation
  • Detection of high resource utilization

Plugins such as Flash will have their initialization deferred until the user actually visits the prerendered page.

Now, the prefetching is supposed to only trigger from the SERPs in those magical "high-confidence" situations, so the impact on both bandwidth and statistics may not be signficant, but still...

--
Ryan

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4347171 posted 1:09 am on Nov 2, 2011 (gmt 0)

I'd forgotten all about this prefetch thingie until it showed up in my logs again. When you are small, you may remember things that happened longer ago-- but Weird Actions On The Part Of Google don't generally make it into stick-in-the-longterm-memory territory ;)

This time I spent a long time poring over the entries-- not because of the prefetch but because I couldn't for the life of me figure out why they'd got a 403. (It finally turned out to be an OS so ancient, I'd assumed it was only used by robots forging UAs. Incidentally, I recently found a real human using MSIE 5 on a PPC Mac. Go figure.)

Thanks to the close poring, I noticed another detail. Much like google translate, the requests come from two places. The page itself goes to a g### IP; the associated files go to the real human's IP, with the whole prefetch line as referer. So does the favicon, for the unrelated reason that g### refuses to buy clothes for its faviconbot, so it gets separately locked out.* All use the real human's UA.

If they had not been 403'd at the gate, they would have been hit with a dense mass of "NO HOTLINKS" graphics because "webcache.googleusercontent.com" et cetera is not on the Approved List.

I only recently added translate.googleusercontent.com and www.google.com/imgres to the list. (And I still feel bad about all those perfectly innocent Spaniards...)

Is this where I throw in the towel and say

RewriteCond %{HTTP_REFERER} !google [NC]

?
That is: Oh, go ahead and hotlink. I have no idea who you are or what you're up to, but it's not the human user's fault.

Aside: I did manage to identify the person I mentioned earlier in this thread. (I do not know a lot of people in Uruguay using Dutch as their system language.) But he couldn't remember any odd behavior, like a picture showing up faster or slower than expected.


* Yes, I could move it from mod_setenvif to mod_rewrite and put in a Condition saying "unless it's google asking for the favicon", but why can't they just give the poor thing a UA?

httpsocial



 
Msg#: 4347171 posted 6:51 am on Nov 2, 2011 (gmt 0)

this isn't good at all

This 61 message thread spans 3 pages: < < 61 ( 1 [2] 3 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved