Forum Moderators: open

Message Too Old, No Replies

Google Web Preview

Not just from bare IPs anymore...

         

Pfui

12:24 am on Aug 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



1.) This is the first time I've seen (or noticed) GWP running from other than bare G IPs:

postnews2.google.com
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.24 (KHTML, like Gecko; Google Web Preview) Chrome/11.0.696 Safari/534.24

08/19 1n:00:24 /dir/filename.html

robots.txt? NO
(I know, I know; GWP's a bot we're oh-so-strongly urged not to treat like a bot)

2.) Then an hour-plus later, along comes one of the 'usual' GWPs --

209.85.224.95
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.24 (KHTML, like Gecko; Google Web Preview) Chrome/11.0.696 Safari/534.24

08/19 1n:03:40 /dir/filename2.html

robots.txt? NO

3.) Btw, G's postnews Hosts do not have worry-free reps... [webmasterworld.com...]

My notes even show 'it' using a blank UA and referring from:

http://www.google.com/search?hl=en&q=spotonkeywordhere&btnG=Google+Search&aq=f&oq=

FWIW

dstiles

7:37 pm on Aug 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I block google web preview. Nasty, insidious thing. (That comment actually refers to both the google and the web preview part.)

Pfui

9:54 pm on Sep 3, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



GWT: Not just from bare IPs anymore.

And not just via these UAs anymore:

Mozilla/5.0 (en-us) AppleWebKit/525.13 (KHTML, like Gecko; Google Web Preview) Version/3.1 Safari/525.13

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.24 (KHTML, like Gecko; Google Web Preview) Chrome/11.0.696 Safari/534.24

Today:

Mozilla/5.0 (Linux; U; Android 2.3.4; generic) AppleWebKit/534.24 (KHTML, like Gecko; Google Web Preview) Version/4.0 Mobile Safari/534.24

I've seen GWT w/ Chrome and Safari before, but never w/ Android and/or Mobile. I'm not surprised, exactly, just curious why G can't use a generic GWT UA like, oh --

Mozilla/5.0 (compatible; Google Web Preview/2.1; http://www.google.com/insertaboutpagehere.html)

-- unless they're checking for mobile-only something or other?

IPs were the usual: 74.125.76., 74.125.78.,209.85.224., etc.

lucy24

10:39 pm on Sep 3, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I block google web preview. Nasty, insidious thing.

Ay-yup.

BrowserMatch "Google Web Preview" keep_out
...
Deny from env=keep_out


Good for what ails you. In my case, it was because I cannot begin to imagine how the sight of a page thumbnail could possibly help someone decide whether the page will contain useful information about, I dunno, the SuperGreek Unicode keyboard layout or whatever it is you're looking for. Nobody ever previewed Grandmother Puss, which might have been useful.

Follow-up after experimentation:
What the ###, ### and also ###? I'm low-traffic. Very. That means gwp normally fetches pages on demand. So where did they get that whole string of pages from the /hovercraft/ directory without anything showing up in the logs except one lone 403? The only thing they wouldn't preview is a page from an area whose entire /images/ directory is roboted-out. And then, inexplicably, the very next page, which takes me...

Wildly OT:
Is there a thread somewhere that explains what a leading "10+ items" means? It's a new one on me. Here it seems to refer to thumbnail links leading to new pages-- and they list some of the links. But only for this one page, though several others in the search results fit the identical pattern.

To make up for the absent "10+ items", they will happily show all the Previews you like. But maybe they won't show them next time around, because now there's a nice string of 403s in the logs, extending a couple pages beyond what I actually looked at. (I think there's a thread explaining this aspect.)

dstiles

7:44 pm on Sep 4, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



As I understand it, from reading around this and the google forums...

Googlebot crawls pages and caches all information (if allowed). If images etc are blocked it obviously cannot cache them.

If ANY part of a page (html, images, css etc) is not allowed to googlebot then GWP comes calling IF the page is listed in SERPS (not sure if this is always or only if preview view is triggered by the SERPS viewer).

Since I almost always block googlebot from images folders (via robots.txt) GWP always comes looking for missing images. And always gets told to, er, "go away".

From the UAs noted above I would guess that google is trying to show previews according to SERPS-visiting platform. Makes partial sense but not complete sense.

Pfui

11:37 pm on Sep 4, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Quick Recap re GWT:

1.) Newly atypical Host: postnews2.google.com

2.) Newly atypical UAs: Blackberry and Mobile

@dstiles:

As to GWT's UA being in some way specific to the clicking-visitor's UA -- a la Transcoder -- interesting theory but not happening at this end. I just tested a bunch and regardless of UA, when I clicked or hovered to show a preview, the immediate G hits were all GWT's 'original' UA:

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.24 (KHTML, like Gecko; Google Web Preview) Chrome/11.0.696 Safari/534.24

keyplyr

1:11 am on Sep 5, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I block transcoder yet display previews in Google Mobil Search.

lucy24

1:13 am on Sep 5, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I just tested a bunch

Which ones did you use? Not browsers of course-- which physical devices? I wanted to check with the iPad but got an even more mystifying result: the Preview option wasn't there! Had to go back to the regular computer to confirm that it isn't a weird Safari Thing.

Pfui

5:35 am on Sep 5, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



- Initial test, my:

MacBook Pro; Safari, Firefox, semi-ancient Camino, super ancient AOL. The first three showed the GWT icons, the latter did not.

- A few minutes ago, husband's:

Windows 7 machine; Firefox
Windows XP Pro machine; Firefox

- Across all three machines and browsers, numerous multiple Preview icon clicks instantly generated GWT-specific access_log hits, 100% of which were from the 'usual' bare G IPs using the 'usual' Linux-based UA.

dstiles

9:45 pm on Sep 5, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm getting puzzled by the reference to GWT - surely that's Google Webmaster Tools? I thought Web Preview was independant of GWT?

Pfui

10:35 pm on Sep 5, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oy vey.

(slaps head)

REWRITE!

"GWT" (a.k.a. Google Webmaster Tools) in my preceding posts should be "GWP" (a.k.a. Google Web Preview).

Whatta goof.

lucy24

10:56 pm on Sep 5, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Overlapping, but I'll leave it.

Typo for GWP?

:: idly wondering if Pfui is the right age to remember the "But can she type?" posters ::

Pfui

12:44 am on Sep 6, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Pardon?

lucy24

1:31 am on Sep 6, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I typed my post before seeing yours. Comes of opening mountains of unread posts in separate tabs and working backward.

There was a certain period in human history when women applying for white-collar jobs were expected to be able to type. (Yes, boys, this is really true.) Followed by a period when some women chose to deny knowing how to type, lest they be stuck in That Kind Of Job all their lives. That's when you got the posters depicting female leaders (Golda Meir is the one I remember) with the caption "But can she type?"

:: We now return to our regularly scheduled forum. ::

Has anyone tried deliberately triggering GWP from a mobile device? Phone, tablet, that kind of thing. That's what I was trying to do with the iPad-- but failed because there simply was no Preview option. (Prefs settings on the iPad are so arcane, I can't figure out how or even whether this can be changed. Even tried the Google Mobile app.) If anyone else tries it, think of search terms that would bring up a page nobody ever visits, so it's unlikely to be archived.

dstiles

7:49 pm on Sep 6, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I wonder if G returns preview for mobiles? No idea as I don't have one.

GWP also requires javascript, which I also do not permit. Possibly cookies, but I don't allow those either from G.

Basically, I don't have a clue what to expect. :)

PS: Untrained and unorthodox, but I can type pretty well and quite fast when I have to. But then, I'm not a woman. :)

keyplyr

8:06 pm on Sep 6, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I wonder if G returns preview for mobiles?

@ dstiles Yes, see above post#: 4358645

In mobile.google.com previews are displayed in a carousel along with others in the top 5 results for the specific search term.

When accessing the regular google search utility from a mobile device previews are the same as for non-mobile browsers.

g1smd

8:23 pm on Sep 6, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



explains what a leading "10+ items" means
It's the number of bullet point list items on the page.

New feature that appeared in Google SERPs almost two weeks ago.

Pfui

3:01 pm on Sep 7, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



FWIW: Here's an array of GWP hits from postnews2 plus IPs again, all using the Linux UA. The rate is too rapid and the directories and files too tangential to be a real person hovering over SERP magnifying glass icons:

postnews2.google.com
09/07 0n:25:44 /dir1/dir2/

64.233.172.33
09/07 0n:25:45 /dir3/filename50.html

74.125.66.80
09/07 0n:25:58 /dir1/dir2/filename22.html

64.233.172.33
09/07 0n:26:00 /dir3/filename20.html

74.125.44.89
09/07 0n:26:10 /dir3/filename46.html

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.24 (KHTML, like Gecko; Google Web Preview) Chrome/11.0.696 Safari/534.24

g1smd

9:02 pm on Sep 7, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I had a string of similar Linux hits spread over a few hours just a few weeks back, just hours after making a change that temporarily screwed up the .htaccess for a site in a way that would probably have signalled "something dodgy going on here".

I assumed at that time that those visits were some sort of manual review.

dstiles

10:10 pm on Sep 7, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've occassionally seen a legit-ish "anti-virus" type of bot reading one or two pages in reasonable fashion for a few months and then getting stroppy and taking loads of pages at high speed, unconnected with any page that the "user" may have asked to see. I wonder if GWP has begun doing that.

keyplr - thanks for the mobile info! :)

mikeavery11

5:57 pm on Sep 10, 2011 (gmt 0)



I know that Googlebot crawls pages and caches all information (if allowed). If images etc are blocked it obviously cannot cache them.

dstiles

9:45 pm on Sep 10, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



And that's when web preview comes around and steals them anyway!

Pfui

8:12 pm on Sep 13, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



1.) First postnews2 appeared running Google Web Preview (GWP). Now this:

postnews1.google.com
[74.125.46.80]
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.24 (KHTML, like Gecko; Google Web Preview) Chrome/11.0.696 Safari/534.24

2.) And just to keep us guessing... I was testing some 'crawler-access' pages in Google Webmaster Tools (GWT) and for things to 'work,' I had to allow an entirely new UA --

Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696 Safari/534.24

-- coming from these IPs:

209.85.224.91
209.85.224.93

That's in addition to the now-routine, no-UA Google Webmaster Tools (GWT) dashboard favicon hits from the likes of:

209.85.224.89
209.85.224.92

3.) Oh, and this showed up for root and two favicons:

216-239-45-4.google.com
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.159 Safari/535.1

Huh-wha?

4.) FWIW... Hardly a week goes by that I don't have to tweak .htaccess on my own time/dime to accommodate some new/different/cloaked G thing/IP/UA. I confess that sometimes, the thought of capitulating buzzes my brain.

Then I get ticked off all over again:)

dstiles

9:00 pm on Sep 13, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have those IPs listed as "google utils" - things like feedfetcher, translate, preview etc. All told to Kill!

If it were not that my clients demand to be seen in google SERPS I would kill all google access. Sadly, in UK, bing is way down the referring chain and no others in any real position at all.

Only one of my sites could probably survive killing google, and that gets about 50% of its traffic from "direct" sources. I'm trying to diversify but we webmasters did too good a job a few years ago telling everyone how good damn google was, and it's an uphill struggle to re-educate them.

g1smd

10:27 am on Sep 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You'll can't see a preview for a URL with an unescaped ' in it. URLs with escaped % 2 7 in them will preview, but URLs with ' in them will not.

You can't use the Instant Preview or the Fetch as Googlebot features in Google WebmasterTools for the URL with the unescaped ' in it.

The single quote truncates the URL that Google thinks that it is handling, and the processes fail (they report "success", but the results can't be seen).

Pfui

12:02 pm on Oct 28, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This just in: Google Web Preview (GWP) running from (a very non-obvious) Google domain:

ez-in-f84.1e100.net
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.51 (KHTML, like Gecko; Google Web Preview) Chrome/12.0.742 Safari/534.51

robtex: "ez-in-f84.1e100.net has two IP numbers (66.102.13.84, 66.102.12.84)."
"What is 1e100.net?" [google.com...]
"Why is Google using 1e100.net?" [webmasterworld.com...]

Makes it difficult to limit fakes -- and/or limit G -- when they keep 'adding' GWP-running domains/IPs.

dstiles

9:22 pm on Oct 28, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have a lot of IPs blocked in the 66.102.12.0/24 range but none in the 13/24 range.

> What is 1e100.net
Pretentions prats.

> difficult to limit fakes
Not really - I block all web preview UAs. :)

g1smd

9:54 pm on Oct 28, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



ez-in-f84.1e100.net -- That's to me is an obvious Google naming system; similar to the one in use when I produced a jumbo list of Google datacentres a few years ago.

Here
ez
is their internal identifier for a server cluster, the first three octets of the IP address.
84
is the final octet of the IP address. They use the same system for their main server systems.

Ha. 1e100 is a googol.

dstiles

8:28 pm on Oct 29, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



And how many people know what a googol is? As I said, pretentious. At the current rate of disaffection they are never going to acheive their goal.

If their DNS entry says googlebot (and if I'm up to date on their IP ranges) their googlebot will get through. GoogleAnythingElse will not. Nor will anything that blocked itself on a googlebot IP and it's likely that the IP will be blocked for a while for googlebot as well. Organisation? I doubt they know the meaning of the word. :(

I recently added a disallow for google into one of my sites. Mind you, it's only a 2-pager that I put up for general spam/hack testing. Blocked it on 23rd and haven't had an access since so it must work. :)