homepage Welcome to WebmasterWorld Guest from 54.242.200.172
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Website
Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 66 message thread spans 3 pages: < < 66 ( 1 [2] 3 > >     
Google Web Preview
Mokita




msg:4223020
 10:02 pm on Oct 27, 2010 (gmt 0)

Has this been mentioned here previously? I couldn't find anything in a search.

Found it crawling one of our sites last night - thought it odd, as it was coming from the 66.249.64.0/19 range normally used by googlebot.

The full UA is:

Mozilla/5.0 (en-us) AppleWebKit/525.13 (KHTML, like Gecko; Google Web Preview) Version/3.1 Safari/525.13

Looking for information, I found this:

Google has been caught testing a major new layout to their search results full page previews of the target site and pale blue backgrounds behind the search results when you hover over them.
...
One of the fascinating things about this is that they are highlighting certain sections of the page in orange and expanding the text to provide a snippet of information. This shows that they have the technology to know exactly where a piece of text is on every single web page. The snippets highlighted are not always the same as the snippet in the search results.


The obvious question raised by this, is the effect it will have on click-through rates.

 

Pfui




msg:4232216
 2:40 am on Nov 19, 2010 (gmt 0)

I just checked another of my sites that denies G IPs access to image dirs, etc., and enforces same via .htaccess. The preview shows up in search results, without the images.

But not for lack of trying --

74.125.46.85
Mozilla/5.0 (en-us) AppleWebKit/525.13 (KHTML, like Gecko; Google Web Preview) Version/3.1 Safari/525.13
11/18 18:32:38 /dir/filename.gif

74.125.46.83
Mozilla/5.0 (en-us) AppleWebKit/525.13 (KHTML, like Gecko; Google Web Preview) Version/3.1 Safari/525.13
11/18 18:32:39 /dir/filename.gif

SEOMike




msg:4232409
 4:33 pm on Nov 19, 2010 (gmt 0)

I don't mind the bot that much. It makes a nice preview of my sites. However, the one thing that I don't like is that it does not display NOFLASH content. The previews in Google just display big empty spots when I specifically created a nice flash alternative placeholders. The placeholders communicate basically the same thing as the flash widget but are designed for users who cannot view flash. I guess I'll have to use more JQUERY because Google's preview displays that just fine.

Samizdata




msg:4232538
 9:34 pm on Nov 19, 2010 (gmt 0)

Another thread [webmasterworld.com...] addressed how webmasters might benefit from the new Instant Preview. In my view this missed the point - I see no potential benefits, only potential pitfalls.

There is a lot of confusion as to how previews work, so here is an attempt to dispel it:

..

Google says the vast majority of "previews" are actually made by Googlebot, and that the Google Web Preview bot is only used "occasionally".

Google Web Preview is a prefetcher bot that is apparently only used when some resources are denied to Googlebot by robots.txt instructions - it only exists to get around those restrictions.

Google Web Preview appearing in your logs does not necessarily mean that somebody actually previewed your site - if someone invokes a preview on any given SERPs page, previews for all the other pages listed are fetched, either from Google's cache or by sending the prefetcher bot.

Google Web Preview bot uses a variety of IP ranges (addressed earlier in this thread). There is apparently no way to verify that it is the genuine article.

Google insists that you must not serve Google Web Preview different content to Googlebot (on pain of severe penalty). Webmasters have no control over their previews.

Google states that "In order for images to be embedded in previews, it is important that they are not disallowed by your robots.txt file." Presumably the same applies to external stylesheets and javascript, the absense of which may dramatically affect how your previews display.

Google advises that "In order to block crawlable images from being indexed, you can use the "noindex" x-robots-tag HTTP header element." In doing so you will be agreeing to Google downloading and caching your images (bandwidth costs to be borne by you).

Google implemented this new feature without Flash support, and reportedly does not display any alternate content you may provide. Any Flash content on your pages will show an uninviting blank area. Silverlight and Java content is treated similarly.

Google's prescribed method of opting out of previews if you do not want them (the nosnippet tag) penalises you by having your text snippet removed.

Google's previews are very large - far too large to be described as "thumbnails". Google apparently believes it is "fair use" to screenshot every page in the SERPs - and also to modify those screenshots as they see fit.

Google appears not to have listed which browsers support previews, though it seems apparent that not all do. Google suggests testing in Safari, though I have yet to see the feature in my version.

Google launched Instant Previews on 9 November and did not warn webmasters in advance. There are many reports of previews with missing images and misrepresented layouts, as well as skewed analytics caused by the prefetcher bot.

Google's Instant Preview FAQ [sites.google.com...] appeared some time after launch.

..

The opening post in this thread perceptively stated:

The obvious question raised by this, is the effect it will have on click-through rates.

The obvious answer is that any webmaster whose previews currently look bad - through no fault of their own - is likely to be being adversely affected right now.

Fair to say that Bing treats webmasters (aka content owners) no better.

...

Sgt_Kickaxe




msg:4232563
 10:26 pm on Nov 19, 2010 (gmt 0)

Google's system is using markup as an indicator, one of many no doubt. You'll notice that on wordpress blogs with comments the preview always ends immediately before the comments begin.

frontpage




msg:4232571
 11:26 pm on Nov 19, 2010 (gmt 0)

I have always had a NOARCHIVE header and now Google has all of my pages in preview.

So, despite me requesting that Google not ARCHIVE my website, it has done that with every page in my domain.

So, Google Preview is essentially violating webmaster trust by doing something against a long standing set of spidering rules.

Further, now that Google Preview does not respect robots.txt, what option is left?

incrediBILL




msg:4232582
 12:15 am on Nov 20, 2010 (gmt 0)

Good point, I have NOARCHIVE set as well.

This starts the writing campaign...

Samizdata




msg:4232586
 12:51 am on Nov 20, 2010 (gmt 0)

I have always had a NOARCHIVE header and now Google has all of my pages in preview

The NOARCHIVE header should still keep your content out of the publicly viewable cache.

The point about the new Google Web Preview bot - which is a prefetcher, and not a spider - is that it only exists in order to circumvent your robots.txt restrictions.

Webmasters should have been given the means to control and provide their own preview image, but while this is very easy to do (by cloaking to the Google Web Preview bot) the technique has been specifically outlawed by Mountain View in the recent FAQ:

You must show Googlebot and the Google Web Preview the same content that users from that region would see (see our Help Center article on cloaking).

So you must treat the Google Web Preview bot the same as Googlebot, even though it is specifically designed to bypass the access restrictions you place on Googlebot.

Google wants to use your images, and you are expected to give them up.

As someone mentioned earlier in the thread "This looks like over-a-barrel time!".

...

incrediBILL




msg:4232611
 2:22 am on Nov 20, 2010 (gmt 0)

This is a prime example of Google not honoring all of their meta directives.

Not only that, but the Google Web Preview spider doesn't have the same reverse DNS full trip used to identify Googlebot leaving webmasters that try to stop site abuse twisting in the wind.

Google needs to step up their game so webmasters can either opt-in or opt-out, but do it with specific tools and not binding multiple crawlers to the same robots.txt entries, it's all getting nutty.

frontpage




msg:4232702
 1:23 pm on Nov 20, 2010 (gmt 0)

The NOARCHIVE header should still keep your content out of the publicly viewable cache.


It does not. The public can view a cached snap shot of all our web pages on an archived server owned by Google.

Google's official definition of NOARCHIVE.

Add the NOARCHIVE tag to a web page and Google won't cache copy of a web page in search results:


Source: [googleblog.blogspot.com...]

Via Google Web Preview, Google is keeping a cached visual copy of your content to show anyone.

astupidname




msg:4232708
 1:50 pm on Nov 20, 2010 (gmt 0)

Google implemented this new feature without Flash support, and reportedly does not display any alternate content you may provide. Any Flash content on your pages will show an uninviting blank area.

If you use methods such as swfobject's "dynamic" method for embedding flash via javascript, your alternative content will show up in the preview just fine.
So far I don't like the "snippet" preview, seems like a breach of intellectual property rights to make a page of mine look different than it actually appears.

astupidname




msg:4232709
 2:04 pm on Nov 20, 2010 (gmt 0)

Just realized also, the web preview does not even get the css right. My pages work in chrome, firefox, IE, Safari (windows) etcetera just fine, but however they are parsing my pages it is with a POS browser that gets the header position out of whack big time.

tantalus




msg:4232725
 4:13 pm on Nov 20, 2010 (gmt 0)

"Google states that "In order for images to be embedded in previews, it is important that they are not disallowed by your robots.txt file." Presumably the same applies to external stylesheets and javascript, the absense of which may dramatically affect how your previews display."

I can confirm that this dosen't apply to blocked css (haven't checked javascript), I had assumed that Google was simply prefetching on behalf of the user who would have full access to those files anyway, and that it is the user that makes the request with google acting as a simple and I assume blind itermediary. Though if thats the case I don't know why images would be treated differently unless there has been some copyright precedent set?

Maybe the clue is in the word "embedded"

Samizdata




msg:4232730
 4:21 pm on Nov 20, 2010 (gmt 0)

If you use methods such as swfobject's "dynamic" method for embedding flash via javascript, your alternative content will show up in the preview just fine.

I hadn't tested this myself (hence "reportedly") but you appear to be correct - though presumably the alternate content should not contain images blocked by the robots.txt file.

I have seen disgruntled reports of alternate content not showing, and this may explain it.

seems like a breach of intellectual property rights to make a page of mine look different than it actually appears

Have to agree.

Via Google Web Preview, Google is keeping a cached visual copy of your content to show anyone

Fair point. The "Cached" link in Google's SERPs is now only half the story.

--

The WebmasterWorld home page link to this thread mentions Google Web Preview "and how to block it". So for those who - after considering all the implications - really want to force "No Preview Available", the answer seems to be to restrict access to images by robots.txt and 403 the Google Web Preview bot.

That way you get to keep your text snippet (for now at least).

...

Samizdata




msg:4232736
 4:57 pm on Nov 20, 2010 (gmt 0)

blind intermediary

A blind intermediary would not modify others' content or display it on their own website.

The Google Web Preview bot only exists to circumvent owners' robots.txt instructions.

...

phranque




msg:4232932
 5:29 am on Nov 21, 2010 (gmt 0)

http://sites.google.com/site/webmasterhelpforum/en/faq-instant-previews#11 [sites.google.com]:
Q: How can I block previews from being shown?
A: You can block previews using the "nosnippet" robots meta tag or x-robots-tag HTTP header. Keep in mind that blocking previews also blocks normal snippets. There is currently no way to block preview images while allowing normal snippets.

Pfui




msg:4232950
 6:42 am on Nov 21, 2010 (gmt 0)

I feel like a David against G(oliath). I gotta get me a slingshot.

Hmm. Maybe it's the phrase: Google Web Preview (with and without quotes). A quick search shows zero Preview icons for those SERPs...

tangor




msg:4232976
 7:53 am on Nov 21, 2010 (gmt 0)

This might be just me, then again...

GWP has been here... so far I have not stopped it. But, and this is why I'm asking, I've noticed that once a page has been "previewed" I don't see it previewed again. Even on popular (highly as in 100-200 pageviews a day) the GWP does not return. Is Google cacheing these "previews" and thus defeating all our attempts to contain the voracious beast?

At least 4 of my "previewed" pages have exhibited this behavior. I have hits before the preview and hits after the preview, but no previews after the first preview... Makes you wonder a bit.

Samizdata




msg:4233032
 2:59 pm on Nov 21, 2010 (gmt 0)

Is Google cacheing these "previews"

Yes they are.

However, various different IP ranges are being used by the Google Web Preview bot and it is possible (though untested) that more than one cached version is made.

There is currently no way to block preview images while allowing normal snippets

Phranque is quoting Google there, but the site I am testing on consistently shows "Preview Not Available" while retaining the text snippet (method described above).

--

From Google's John Mueller, 16 November:

As we use normal crawling to create these previews (on-the-fly accessing is only used for cases where we don't have recent, complete data from crawling), over time the accesses will be mostly limited to normal crawling activity.

My emphasis.

This seems to suggest that Google expects almost all webmasters to remove their robots.txt restrictions on image crawling. I suspect they will probably succeed in this, and if anyone chooses to call it "coercion" I will not argue.

The Google Web Preview bot only exists to circumvent webmasters' robots.txt instructions.

...

Samizdata




msg:4233037
 3:26 pm on Nov 21, 2010 (gmt 0)

Another thought:

If you have not been blocking the (disguised as a Safari browser) Google Web Preview bot from the start then you may now be too late - the bot will already have cached a screenshot and if it cannot be updated Google will likely use that version in perpetuity.

There is probably no way to remove it.

...

HenryUK




msg:4233582
 2:15 pm on Nov 22, 2010 (gmt 0)

As I mentioned earlier on another thread (http://www.webmasterworld.com/google/4228491-8-10.htm) and now reported today at SearchEngineLand [searchengineland.com...] this issue may also be playing havoc with your visit/visitor/page view metrics.

On one site that I work for, we are seeing visitors inflated by 25% by Google Instant Preview (about 250k visitors in one week - it's a very large site with millions of URLs in the index).

Check your browser stats: if that version of Safari has shown a big increase you might be suffering the same thing. Clearly this will play havoc with any dependent conversion metrics.

It's not just GA that is affected by this of course, we are seeing the same issues with Omniture SiteCatalyst. Adobe promised to look into it when informed of the situation by a member of my analytics team.

fabulousyarn




msg:4233625
 4:12 pm on Nov 22, 2010 (gmt 0)

I wonder if this is why my sales have recently exploded - I focus very intensely on pictures - and big ones - and they look fabulous in preview as opposed to my competitors leeeetle teeeeny thumbnails. Hmm.

Pfui




msg:4233742
 7:13 pm on Nov 22, 2010 (gmt 0)

Congrats on your boom! Regardless of causation, that's nice news!

Thoughts...

Previews just premiered this month so if your increases are in the last two to three weeks, I'd be more likely to attribute same to factors beyond G's control -- thus far. Factors like the mid-term elections didn't frighten and/thus the stock market's holding its own.

Another factor could be major retailers using TV-radio spots are heavily trumpeting online holiday buying -- and tie-ins via their standalone sites and their Facebook pages -- and have since well before Halloween. They've seemingly extended this week's (counterintuitively-named) Black Friday into a months-long retail campaign.

For example, eBay started a Christmas countdown atop its pages eight weeks out. Fifty-something days till Christmas anyone? Oy.

Last but not least...

A marketing site's prelim tests (the URL of which I neglected to save & can't re-find now, sorry) showed most people either didn't notice the magnifying glass icon, had no clue what the it did, or simply didn't use it.

For me, the 'Preview Effective?' jury's still out. If only referers indicated they came that way.

fabulousyarn




msg:4233749
 7:32 pm on Nov 22, 2010 (gmt 0)

I think most people find it the way I did, by accident. I know my sales have spiked due to SEO primarily, and additionally, some very intense adwords tweaking PLUS.....very targeted advertising and promo - but I wonder how many of the wigglers, I call them, hesitiating ove rwhich link to click, found my site accidentally - once you discover the preview, its very fun to jump from preview to preview, and in that, my site rocks.

ken_b




msg:4233750
 7:39 pm on Nov 22, 2010 (gmt 0)

I wish there was some way to include an alt message saying "visit the site, or "click here" to see the images" that would get the visitor to the actual site.

I don't expect that function to be avalable anytime soon.
..

fabulousyarn




msg:4233751
 7:50 pm on Nov 22, 2010 (gmt 0)

if you click on the preview, you go straight to the site...or are you meaning something else?

Pfui




msg:4233760
 8:02 pm on Nov 22, 2010 (gmt 0)

I'd like to see referers from people actually clicking on/through the Preview. For example, oh:

http://www.google.ca/search?q=[keyword(s)-here]&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a&click=preview

That's an actual G referer [keywords-obfuscated], plus my simplistic addition in bold, formatted according to the other parameters.

Samizdata




msg:4233779
 8:32 pm on Nov 22, 2010 (gmt 0)

I'd like to see referers from people actually clicking on/through the Preview

In my test just now the referrer was included, same as usual, when clicking through.

Note, however, that a "Preview Not Available" image is not clickable.

If you are talking about logged hits from the Google Web Preview bot that is different - seeing the bot in your logs does not necessarily mean that your site was previewed, just that it appeared on the same SERPs page as one that was (and that you haven't removed your robots.txt restrictions).

I wish there was some way to include an alt message

It is technically very easy for webmasters to control what is displayed in the preview.

Unfortunately, doing so is likely to get your site banned.

...

Samizdata




msg:4233781
 8:40 pm on Nov 22, 2010 (gmt 0)

@ Pfui

Apologies, I misunderstood - you want to know how many people actually use the feature.

One problem with that is that people may look at the preview then click on the text SERP.

...

Pfui




msg:4233799
 9:04 pm on Nov 22, 2010 (gmt 0)

@Samizdata : If imminent visitors click on the text, no prob. That's the 'usual' method and shows in referers (browser-willing). If G added a referer param -- for example, &click=preview -- that would indicate a click on the preview. Then again, if wishes were horses... :)

tangor




msg:4233807
 9:10 pm on Nov 22, 2010 (gmt 0)

Another look at this issue:

[theregister.co.uk...]

Samizdata




msg:4233836
 10:04 pm on Nov 22, 2010 (gmt 0)

A Google employee quoted in The Register:

We're working on a solution for this, to prevent Google Instant Preview on-demand fetches from executing Analytics JavaScript

So much for testing the beast before unleashing it on the world.

It would also have been sensible - not to mention polite - to warn webmasters in advance that they are expected to remove their robots.txt restrictions so that their images can be used.

As for offering a "nopreview" tag, they wouldn't want to copy Bing, would they?

...

This 66 message thread spans 3 pages: < < 66 ( 1 [2] 3 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved