Google Enables FireFox Prefetching

Forum Moderators: DixonJones

Message Too Old, No Replies

Google Enables FireFox Prefetching

GeorgePSmith

2:41 pm on Mar 31, 2005 (gmt 0)

Today Google announced on their blog (it won't let me post a link, search for it) that they now offer "Enhanced searching with Firefox". This boils down to instructing Firefox to "prefetch" [google.com] search results, meaning that every time I search on Google I will be visiting all sorts of web sites that I do not want to visit. Getting all sorts of cookies that I don't want to get. Does anyone else feel that this is a huge violation of online privacy? Or am I just paranoid? :)

-gps

claus

2:04 pm on Apr 1, 2005 (gmt 0)

>> .htaccess

Okay, i think these will work - haven't tested them but i did look up in the Apache specs before writing them, so they should be allright. Sort of. I hope. (It depends on your server settings of course, eg. not everyone has "setenvif" enabled) ;)

This one should ban them:

--------------------------------- 
SetEnvIf X-moz prefetch HAVE_X-moz 
Deny from env=HAVE_X-moz 
---------------------------------

...and this one should 404 them:

--------------------------------- 
SetEnvIf X-moz prefetch HAVE_X-moz 
RewriteCond %{ENV:HAVE_X-moz} !^$ 
RewriteRule .* some-filename-that-does-not-exist.htm [L] 
---------------------------------

..or, with another syntax:

--------------------------------- 
SetEnvIf X-moz prefetch HAVE_X-moz="prefetch" 
RewriteCond %{ENV:HAVE_X-moz} prefetch 
RewriteRule .* some-filename-that-does-not-exist.htm [L] 
---------------------------------

In some server configs you don't need the "ENV:" part, but the Apache docs say that you can't do without it... it's a strange world.

...add the [OR] flag as needed.

References:
1) [httpd.apache.org...]
2) [httpd.apache.org...]
3) [httpd.apache.org...]

Added:
I've usually got a few #1's but i have no use for artificial traffic. Send me real users anytime, that won't bother me :)

claus

2:48 pm on Apr 1, 2005 (gmt 0)

...here's another which should add "_PREFETCH" to the browser string and let the request pass through:

---------------------------------  
SetEnvIf X-moz prefetch HAVE_X-moz="prefetch"  
RewriteCond %{ENV:HAVE_X-moz} prefetch  
RewriteCond %{HTTP_USER_AGENT} ^(.*)$ 
RewriteRule (.*) [example.com...]  [E=HTTP_USER_AGENT:%1_PREFETCH,L] 
---------------------------------

...and this one should set a cookie with name "visit" and value "prefetch", valid for three minutes:

---------------------------------  
SetEnvIf X-moz prefetch HAVE_X-moz="prefetch"  
RewriteCond %{ENV:HAVE_X-moz} prefetch  
RewriteRule (.*) [example.com...]  [CO=visit:prefetch:.example.com:3,L] 
---------------------------------

- feel free to combine all the above ones.

---
Ref: [httpd.apache.org...]

mattglet

3:55 pm on Apr 1, 2005 (gmt 0)

claus-
Great info... How about something for us IIS folks?

ronburk

5:00 pm on Apr 1, 2005 (gmt 0)

If a users does click on a link to a prefetched document, while the prefetch is still in progress, the document will be requested again. That screws your logs even more.

I keep checking to make sure I'm in the "tracking and logging" forum, as I can't believe some of these posts.

If your current log analysis cannot interpret double-fetches reasonably, then you were screwed a long time ago. What do you think happens when someone gets tired of waiting for your 125KB home page to download, hits the stop button, and then refreshes?

If you RTFA, you will see that the additional double-fetches induced by this Google tweak are unlikely to match the number of normal double-fetches you get (especially since they would be caused by exactly the same phenomenon -- your page doesn't load fast enough).

I'll wager that 95% of all WebmasterWorld websites will not experience this phenomenon a single time in the next year. I'll further wager that the Internet traffic spent on discussing the issue will exceed that actually caused by the issue itself.

insight

5:38 pm on Apr 1, 2005 (gmt 0)

What is a good package for interpret double-fetches reasonably, and similar bogus requests?

I currently use Analog. I find it excellent for exact reports, but it doesn't make any attempt to infer user behavior from requests.

Netizen

10:37 am on Apr 2, 2005 (gmt 0)

I think the point is not that there will be double requests but you wil get page requests that were not initiated by a real user. As someone said, if you search for 'intel' on Google then Firefox et all will go off and prefetch www.intel.com whilst you look at the results page. Even if you never go and click on the link you will have fetched a page of the intel.com server. This can make it hard to track real usage of the site concerned.

Of course, in the Mozilla Link Prefetching FAQ [mozilla.org] that was mentioned in an earlier post they state that they won't prefetch links that have a query string, which will cut down on some of the pages prefetched. Also, I don't think many prefetch links are being put in by Google at this time.

Personally, I don't see the advantage in adding the prefetch links - it will either be on such a small percentage that it won't make any difference, or if it becomes more prevalent then more websites will start blocking it.

claus

3:54 pm on Apr 2, 2005 (gmt 0)

>> This can make it hard to track real usage of the site concerned.

And inflate the Firefox stats

cgrantski

7:21 pm on Apr 3, 2005 (gmt 0)

I have to correct myself from a previous post where I wished Mozilla would alter the UA string for prefetched hits so we could filter them out of logs. That would, of course, mess up a common method of sessionizing hits. Sorry. Mozilla, if you're listening ... could you please just add a parameter to the query string of those hits? or something?

Romeo

8:33 pm on Apr 3, 2005 (gmt 0)

... just found that one of my pages is honoured by Google with a link rel="prefetch" attribute in the SERPs.

As it is a php page, to see the prefetch in the logs, I just added a


<#php # set additional logging info
 if (isset($_SERVER['HTTP_X_MOZ'])) { 
 $additionalloginfo = 'X-moz:' . $_SERVER['HTTP_X_MOZ'];
 apache_note('p1',$additionalloginfo);
 }
?>

to the top of the page and extended the apache's log format definition in httpd.conf accordingly:


LogFormat "%.......... \"%{p1}n\"" combined

which appends a "X-moz:prefetch" to the end of a log record of a prefetched request.

Regards,
R.

JeremyL

8:52 pm on Apr 3, 2005 (gmt 0)

So what would be the solution for IIS?

I have to say enabling prefetch in the serps has to be one of the worst ideas to ever come out of google.

claus

9:08 pm on Apr 3, 2005 (gmt 0)

Unfortunately i know very little about IIS, definitely not enough to help in this case. Perhaps someone in the dot net and ASP forum [webmasterworld.com] can help here (that's the closest to an IIS forum there is here, i think).

Romeo

10:44 pm on Apr 3, 2005 (gmt 0)

... and looking into my logs, I am not sure about the benefits of that prefetching for the visitor:
whenever I find that a user gets interested in my site from the Google SERP, I see a *new* request for that same page a few seconds (1...3 seconds think-and-click time) after the prefetch request, with identical referer/search string data, and the page gets delivered again with status '200' and full byte count.

Looking at the http headers of what my server sends, there is no cache-relevant info inside which could force the user's browser to fetch a fresh copy of the page. So I am baffled about this behavior, which leads to the conclusion:
my server has more traffic but the visitor has no benefit, because he will reload a fresh page apparently not using the prefetch cache.

Pls correct me, if you see other things in your logs.

Regards,
R.

grumpyoldman

10:51 pm on Apr 3, 2005 (gmt 0)

Can anyone confirm that the Mozilla prefetch is only from organic links?

The traffic distortion is bad enough, but if the page at the end of an Adword link is prefetched, does that result in Google revenue inflation at the advertisers' expense? Google could I guess argue that their involvement is delivering the search results, but I'd like to know that our Adword click throughs are only from user choice.

Tonerman

1:58 am on Apr 4, 2005 (gmt 0)

When I first scanned this topic it worried me. When I had time to come back and give it some more attention I just don't see the big deal. This is not the end of life as we know it.

From a webmaster viewpoint I use a good log analyzer so I don't care what's in my stats. Not only that but I will probably create a custom filter to detect firefox prefetch requests because it will tell me which pages I have on Google that are not only the first listing, but a page and a search listing so good that Google's algo believes the Get request is practically automatic. Do you think I am going to mess with that page on my site?

Server loads? Maybe that's something to think about about on a huge popular site always on top of the search listings - but if that is the case you probably have a server capable of handling the server load with ease.

Download speed on cable modems? I don't care if they are using a light pipe I want the page in their face as fast as possible before they get up and go to the bathroom.

Cookies? Personally, I have a bunch of tools swatting cookies all the time. They are like bugs in the house - no matter how many times you have the Orkin man come over you still find bugs in the corner of the bathroom.

Privacy on the web is almost a joke anyway. Smart users who don't want the pre-fetch feature will turn it off. The people who don't know how to do anything more than browse the web will be no worse off for the wear and tear.

If I could hit a button and nuke something it would be the firefox favicon.ico downloads. Favicons used to be useful to me for grading PPC campaigns or page popularity. Now (unless you want to spend the time needed to creat the filters needed to determine learn which favicons were probably "bookmarks" versus worthless firefox favicons) they are useless to me.

Lastly, all of the paranoid schizophrenic posts about Google drive me nuts. As the mob guys say "nobody's gonna tell me how to runna my business!". Google is a business and does whatever they think is in their best interest - just like me. Thank goodness their business interests and mine are the same - making money on the web.

hutcheson

3:50 am on Apr 4, 2005 (gmt 0)

This has been a fascinating thread. Over in the Flash-spybot thread I watched you guys desperately trying to think like a security-conscious surfer (and failing). Now, with one stroke, Google has opened your eyes and raised your consciousness. It's a miracle!

plumsauce

4:56 am on Apr 4, 2005 (gmt 0)

And inflate the Firefox stats

In some lightly travelled domains I have already seen this before prefetch reared it's ugly head. If you look at the logs from a site that is lightly used you can often see Mozilla derivatives stumbling around like a drunken sailor with multiple requests for each single file over a number of files. It looks like the browser is almost stuttering. I have no way of knowing why it does this, but it does. Browser spam?

These are domains that sometimes only get a single visit a day so it is very easy to observe in isolation. A 3k logfile is pretty easy to read carefully :)

[edited by: plumsauce at 4:57 am (utc) on April 4, 2005]

larryhatch

4:56 am on Apr 4, 2005 (gmt 0)

Dumb simple question:

If I manually look thru my access_log files, can I see that a
prefetch has happened? - Larry

zCat

6:53 am on Apr 4, 2005 (gmt 0)

Cookies? Personally, I have a bunch of tools swatting cookies all the time. They are like bugs in the house - no matter how many times you have the Orkin man come over you still find bugs in the corner of the bathroom.

Not knowing who the Orkin man is, I fatally misread that sentence and was wondering if there was some aspect to this cookie-squashing business which had previously eluded me ;-).

AlexK

6:54 am on Apr 4, 2005 (gmt 0)

moltar:

Research on a wide variety of hypertext systems has shown that users need response times of less than one second ... studies done at IBM in the 1970s and 1980s found that mainframe users were more productive when the time between hitting a function key and getting the requested screen was less than a second.

<off-topic>An excellent article in .EXE magazine many years back told of an occasion in a Bank where a particular function worked in the programming but not in practical action. On-site investigation showed that the function had a long delay in returning a result... the operator got impatient and tapped the function key again... which caused the routine to terminate the previous fetch and restart over from the beginning... plus ca change.</off-topic>

Romeo

12:29 pm on Apr 4, 2005 (gmt 0)

Hi Larry,

If I manually look thru my access_log files, can I see that a prefetch has happened? - Larry

If you want to know for sure: no, unless you implement a mechanism similar to those described in msg #62 or #69 to grab that information from the browser's request header and write it to the log.

Regards,
R.

corpuscle

1:26 am on Apr 5, 2005 (gmt 0)

This is an ill-thought out Mozilla feature that can easily, and with all the new publicity probably will, be used for DDOS attacks. An attacker need simply create a page with hundreds of links to the victim's web site, then place it within a practically invisible iframe or frame on a high traffic site. It will bring the victim's server to its knees. Mozilla *really* should have implemented this so that links can only be prefetched if they are on the same domain as the referring document.

pigsonthewing

9:31 am on Apr 8, 2005 (gmt 0)

It seems that the only way to stop this [google.com] is to disable Firefox's prefetch (which is really intended to fetch the next in a sequence of pages on one site; see the Mozilla pre-fetching FAQ [mozilla.org]). I'd like to be able to block Google via robots.txt; and I'd like to be able to configure Firefox to only apply prefetch to pages within the same domain as the original.

isys87

9:35 am on Apr 9, 2005 (gmt 0)

If I manually look thru my access_log files, can I see that a prefetch has happened?

You can check whether a Mozilla/Firefox user from google also requested images or other non-page files (e.g. css). Browsers of real users usually request images to show pages so if only page files are requested it should be a spider or prefetch request.

This method won't work if your pages don't contain images or external css, js, etc but it's quite unusual now. There may be also problems with users who disable images but number of such users should be also quite low now.

Woz

9:57 am on Apr 9, 2005 (gmt 0)

Brett > " You know - GG is dead right - this isn't a Google issue. The issue is Mozilla including support for what is esentially a site scrapper tag. It is time to take this up with Mozilla."

Actually no, it is a Google issue as well as a Mozilla issue. That statement is akin to saying that because companies manufacture weapons, the results of the use of the weapons is the fault of the manufacturer, not the wielder.

One could argue the merits of the feature in Mozilla, but it is the use of the feature that causes the problem. Google should let the user decide which site to access, and not pre-fetch sites just in case the user wants to go there regardless of the probability.

Craig_F is correct. What's good for the Goose is good for the Google.

Onya
Woz

Liane

10:18 am on Apr 9, 2005 (gmt 0)

All I'm asking is for people to give it a try for a few days before they decide whether to block it or not..

Sorry GG, but the "value" of prefetch does not outweigh my privacy. I resent the fact that Google just goes ahead and does this sort of thing without allowing the user to opt out before it is implemented. In fact, it should be an opt in feature rather than the other way around.

bull

7:16 pm on Apr 9, 2005 (gmt 0)

AOL search has been doing this for months now. I heard no one complaining.

homegirl

3:25 am on Apr 13, 2005 (gmt 0)

To answer an earlier question, I confirm that it appears to only affect the #1 organic result- and NOT AdWords or paid listings (even if the Ad makes it to the top spot across the page as a sponsored listing). At least, this is what happens for me consistently across numerous keywords.

This 87 message thread spans 3 pages: 87