homepage Welcome to WebmasterWorld Guest from 107.20.109.52
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Google Web Preview hits show stock referrer
http://www.google.com/search
Pfui




msg:4404663
 11:58 pm on Jan 6, 2012 (gmt 0)

This just in: Google Web Preview (GWP) hits include a referrer -- sort of:

http://www.google.com/search
(no trailing slash)

On my largest site:

74.125.78.89
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.51 (KHTML, like Gecko; Google Web Preview) Chrome/12.0.742 Safari/534.51

17:54:03 /dir/filename.html

robots.txt? NO (never)
REF? http://www.google.com/search (never before)

Yesterday, I saw it. Today, I tested it and every time I requested a SERP preview, the hit to the page always showed it. Image hits do not; they show site pages as referrers.

(Aside: Another GWP change occurred along the way; I forget when. In SERPs, G changed the little magnifying glass icon to a big rectangular hoverbutton showing only >>. As with the old icon, click on the >> and the Previews appear, if available. Click again and they're gone.)

Okay. Back to the new /search referrer:

- It doesn't include ANY parameters, but clicks-thru from Previews do.
- ALL GWP hits to .html files show it -- so are they really in real time?

Anyone else seeing "http://www.google.com/search" in their logs? If not, SERP your domain, >> a few Previews, and see if GWP shows up and shows it.

 

keyplyr




msg:4404672
 12:54 am on Jan 7, 2012 (gmt 0)

GWP did an actual (small) crawl last week. Took 50+ HTML pages, no images/scripts/css/etc.

Sorry, don't remember if it used that referrer and I stopped saving old logs so I can't look.

Also, I *think* G stopped using the magnifying icon for the previews when they moved that icon up to the search box. Seeing that, I did it on my site too :)

Pfui




msg:4412935
 2:54 am on Feb 1, 2012 (gmt 0)

Another tweak in the format of GWP's now-standard fake referrer. Note the /m/ directory:

http://www.google.com/m/search

Note, too, the Android Mobile UA:

74.125.64.95 [projecthoneypot.org...]
Mozilla/5.0 (Linux; U; Android 2.3.4; generic) AppleWebKit/534.51 (KHTML, like Gecko; Google Web Preview) Version/4.0 Mobile Safari/534.51

Too bad we can't get paid for being Google's guinea pigs.

keyplyr




msg:4412950
 3:16 am on Feb 1, 2012 (gmt 0)

Too bad we can't get paid for being Google's guinea pigs.

You're not getting paid?

Pfui




msg:4413015
 10:06 am on Feb 1, 2012 (gmt 0)

Pfffffft :)

dstiles




msg:4413248
 9:31 pm on Feb 1, 2012 (gmt 0)

I've been seeing the "referer" for a few days now. I wonder if they are trying to lull us into accepting it if it's got a referer. :(

Staffa




msg:4413303
 11:58 pm on Feb 1, 2012 (gmt 0)

I saw three today, each with the http: //www.google.com/search as referrer and the full UA but identifying themselves as :

ee-in-f155.1e100.net
ee-in-f159.1e100.net
ee-in-f144.1e100.net

At first I thought it was someone faking but the IP numbers are genuine : 74.125.16.155 / 159 / 144

Another way of "make believe" ?

Pfui




msg:4413325
 12:41 am on Feb 2, 2012 (gmt 0)

Google's been GWP'ing from bare IPs and lots of its (known) Hosts for some time now. [webmasterworld.com...] (And using 1e100.net for even longer.) [webmasterworld.com...]

Unfortunately, G runs things we might want through the same numbers it also runs junk. [webmasterworld.com...]

Samizdata




msg:4413362
 2:47 am on Feb 2, 2012 (gmt 0)

My logs suggest that the number of people actually using Google Web Preview is tiny.

And the number who care that it gets a 403 on my sites appears to be even tinier.

I don't have a PhD though.

...

keyplyr




msg:4413364
 3:06 am on Feb 2, 2012 (gmt 0)

My logs suggest that the number of people actually using Google Web Preview is tiny.

I have the opposite opinion.

Pfui




msg:4413420
 9:23 am on Feb 2, 2012 (gmt 0)

I have no proof from my logs that anyone's using it, or not using it, neither when the referrer was faked blank for all hits, nor now that it's faked blank for graphics and faked /search for .html files.

lucy24




msg:4413444
 11:25 am on Feb 2, 2012 (gmt 0)

I needed to test something about a particular Preview of one of my own pages. The Preview came with a blurb saying something about showing a cached version. This kinda implies that there also exist non-cached Previews.

Possibly the Previewbot figured out that I was rewriting it from a 700K page to a 35K mockup which looks exactly the same. I deliberately searched for a unique phrase that occurs late in the (complete) file. Odd thing is, the logs show a filesize that's even smaller-- though still much bigger than any detour or error document. But it comes through as 200 so they must have got something.

And yes, the referer line is the "search" version. I tend to take that at face value. The stylesheet had a normal referer, same as when a human views the page.

dstiles




msg:4413667
 9:28 pm on Feb 2, 2012 (gmt 0)

From what I recall, web preview only goes to a site if it does not have something it needs through a normal bot scan. If you block images, css, whatever in robots.txt then preview comes looking for the missing bits (and in my case gets a 403 or 405 - can't recall which off-hand).

Pfui




msg:4413719
 12:05 am on Feb 3, 2012 (gmt 0)

Erm, missing piece-gathering is not my experience, and I'm not sure GWP's actions are that clear-cut. (Would be nice tho'.)

lucy24




msg:4413771
 3:21 am on Feb 3, 2012 (gmt 0)

Hm. I thought right away of a page where I could test that, because half of its images are in a roboted-out directory. (People were using it mainly as hotlink fodder.) Got that same "cached" blurb-- but I can't imagine what was in the cache, since my logs, downloaded immediately afterward, show that the html, css and every single image on the page were downloaded fresh.

Further investigation tells me the regular googlebot last saw the page 5 days ago, recording a 304. It picked up the stylesheet a day earlier in with-referer mode (different page in same directory). Dates on the images range from 2004 to 2007-- this is not an actively maintained page-- but it picked up all of them for good measure, including the ones it has seen within the past month.

No piwik, so it didn't get a chance to have the door slammed in its face there.

Wonder what "cached" means?

Edit:
Apparently it means "If it wasn't cached before, it is now." I tried half an hour later with a different browser and different search phrase-- mildly gratifying to find myself at #2 for this one-- and the logs were silent. If I remember, I will try it again after 6 or 12 hours. If Preview works like Translate, they keep the cache sitting around for a few hours before tossing it.

Pfui




msg:4413941
 4:38 pm on Feb 3, 2012 (gmt 0)

[Googlebot] picked up the stylesheet a day earlier in with-referer mode

Was that a legit Googlebot? Because with almost all, if not all, of Google's bots, showing referrers is rare as hen's teeth (...or was, until GWP started up its fake REF repertoire).

lucy24




msg:4414039
 8:34 pm on Feb 3, 2012 (gmt 0)

There's a thread about googlebot and referers. You can get a series of googlebot hits from the identical IP, identical UA, and in the middle there's one or more giving a referer.

It's a fairly recent development, calculated to make people suspect they're up to no good ;) Maybe it's another way to test if humans and robots see the same page: normally a robot would arrive without referer for images and stylesheets, so it would be the easiest thing in the world to rewrite them to a different form.

Pfui




msg:4414094
 11:23 pm on Feb 3, 2012 (gmt 0)

I located your report of one session in Google SEO News and Discussion here [webmasterworld.com] It gives me something to watch for, thanks, because Googlebot et al sending referrers is still rare as hen's teeth.

g1smd




msg:4414107
 12:29 am on Feb 4, 2012 (gmt 0)

The Preview came with a blurb saying something about showing a cached version. This kinda implies that there also exist non-cached Previews.

What I don't get is that sometimes the Preview is quite stale perhaps days or more old with a much newer version of the same page viewable in the normal Google cache, while other times the Preview is very new but the cache copy is days or more old.

keyplyr




msg:4414111
 12:55 am on Feb 4, 2012 (gmt 0)

What I don't get is that sometimes the Preview is quite stale perhaps days or more old with a much newer version of the same page viewable in the normal Google cache, while other times the Preview is very new but the cache copy is days or more old.

A definite enigma.

lucy24




msg:4414128
 2:57 am on Feb 4, 2012 (gmt 0)

Oh, that reminded me. Preview request #3, using a third browser and different search string-- ###! If I'd thought of it I would have tried from the library which has a different IP and, of course, entirely different computers. Must still be cached, because there's nothing new in logs.

I did find an unrelated adding-insult-to-injury entry though. In fact, two of them back to back. Two different people inexplicably asked for Previews of a page that included hotlinks. Grr. Nothing to do about it except what I'm already doing, which is to show the garish No Hotlinks image. The offending site comes through as referer, with Google's IP, and Preview in the UA.

What I don't get is that sometimes the Preview is quite stale perhaps days or more old with a much newer version of the same page viewable in the normal Google cache, while other times the Preview is very new but the cache copy is days or more old.

I think that reinforces the idea that the Googlebot and the I Am Not A Robot don't talk to each other.

g1smd




msg:4418601
 1:56 am on Feb 17, 2012 (gmt 0)

Web Preview is a bit flaky to say the least and the tools in WMT are quite unreliable.

In WMT the "Pre-render Desktop Search Instant Preview" image is completely broken for every page on a site I am currently looking at. There's a message below the images to say that there were many errors fetching resources from the site. However, looking in the server logs for those resources shows that Google requested all of them and all were served in full and with "200 OK" status. Requests came from multiple IPs to grab the page, images, scripts, etc in a very short time.

For those files tagged as "Fetch failure" all I can assume is that Google's rendering script didn't wait long enough for the fetch part of the process to finish fetching them all.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved