Welcome to WebmasterWorld Guest from 3.92.92.168

Forum Moderators: Ocean10000

Message Too Old, No Replies

googleweblight for slow connections

Spawn of google_transcoder

     
2:13 pm on Jul 22, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Oct 13, 2003
posts:705
votes: 0


WebmasterWorld search is moribund, so forgive me if you've thrashed this newish Google service.

We've long blocked the G Wireless Transcoder plague, but note this mobile UA...

66.249.89.76 - - [20/Jul/2015] "GET /google_transcoder.txt HTTP/1.1" 404 501 "http://www.example.com/example.htm" "Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 5 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko; googleweblight) Chrome/38.0.1025.166 Mobile Safari/535.19"

Then a frustrated Jose tried to sneak in using the googleweblight.com site;

199.190.45.67 - - [20/Jul/2015] "GET /example.htm HTTP/1.1" 403 243 "http://googleweblight.com/?lite_url=http://www.example.com/example.htm&ei=tlghlkd&hc=en-IN&s=1&m=35&ts=1437948707&sig=jasdnUxu6mmSrj29-7nR1KaVgX5hj2qV_u" "Mozilla/5.0 (Linux; U; Android 4.3; en-US; SM-G7102 Build/JLS36C) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.0.1.512 U3/0.8.0 Mobile Safari/534.30"

Everything Google is suspicious, and googleweblight looks very slippery, ergo blocked on referrer and UA until proven innocent. (Hah!)

G claims it is to "help visitors on slow connections", so they strip down your site and feed it to them ready-chewed.

How thoughtful of them.
11:45 pm on July 22, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


This appears to be mobile only. Anyone seen their site rendered with either the iPhone or Android googleweblight?

So the assumption is, if you use Google Mobile Search and your connection is detected as slow, Google will feed you their version of the target site similar to the old transcoder?
3:46 am on July 23, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2067
votes: 2


In a word: Yep. [androidpolice.com...]

GoogleWeBlight seems more apropos. Thanks for the head's up.

P.S. More details; plus if you run ads, scroll down to" Ads and Revenue": [support.google.com...]
4:05 am on July 23, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Oct 13, 2003
posts:705
votes: 0


Imagine what it must be like to be employed by such a dissembling organisation.
10:08 am on July 23, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


Had a friend in Java (where 2g is pretty much the norm) send me screenshots of several of my pages accessed via googleweblight. The pages look just fine... Adsense displays normally.
Google's estimates have you loading the page 4x as fast compared to the unaltered version when on a slow connection, with 80% less data. They claim, appealing to webmasters, that this results in 50% more pageviews due to the better experience and lower wait.
Places like Indonesia, India and Brazil (Android's largest emerging markets) will benefit greatly, which presumably translates to more page loads for webmasters.
Users can choose to load the full page if they want while webmasters can opt out of the change entirely.
(source: [androidpolice.com...] )

If you choose not to participate, it appears the best way would be to opt-out via response header:
Header set Cache-Control: no-transform
Blocking by referrer/UA will just give the user a 403 w/ no way to choose to load the full (non-transcoded) page, so they'll likely just go away IMO.

Bad news seems to be, whether you opt-out or block, if Google can't transcode your page for slow mobile connections, it will add a notice in the Mobile SERP saying as much. Haven't seen the exact wording yet, but I hope it's not something like "This big, fat web page refuses to load quickly!"
7:57 pm on July 23, 2015 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3287
votes: 19


Thanks for the info and for the header. I've set the header globally in IIS manager.
1:03 pm on July 24, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Oct 13, 2003
posts:705
votes: 0


Wiser heads choose to block this monster :)

Blocked on UA, referrer, and IP/CIDR until further notice.
7:03 pm on July 24, 2015 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3287
votes: 19


Who said I was wise? :)
8:12 pm on July 24, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


As explained earlier, bad idea to block; you'll be blocking traffic. Just opt out. Very simple to do, and less work.
1:31 am on July 25, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Oct 13, 2003
posts:705
votes: 0


Blocking traffic holds no fear for the wise, especially blocking undesirable traffic.

Go ahead KP, and cede authority of your site to Google by trusting them to honour their "opt-out" option.

But know that you will regret it... eventually. :)
1:52 am on July 25, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


The wise...? OK, your site, your choice :)

But blocking visitors? How is that getting back at Google?
(assuming Google needs getting back at)
4:29 pm on July 26, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 4, 2008
posts:3672
votes: 374


Hmm ... I'm seeing a fair number of visitors apparently coming directly from the website at [googleweblight.com,...] but when I try to go there myself, I get a 404 error. Maybe it's because i'm using a desktop instead of a mobile device -- but I don't have a mobile device available at the moment to check. Seems like an odd situation.
4:49 pm on July 26, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


Firefox has a built-in mobile screen emulater. Or install that plug-in for Chrome described on the Google weblight info page that's posted above.
5:35 pm on July 26, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 4, 2008
posts:3672
votes: 374


Well I just tried it on an android tablet and still get a 404. I don't use Chrome, and didn't see the option on Firefox.

But if google doesn't want me to see one of their websites, that's their privilage. I'm not going to spend any more time on it..
8:47 pm on July 26, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2067
votes: 2


I'm not going to spend any more time on it.

Same here. Including no more time spent tweaking yet more of my code to conform to what Google wants, or wants to do with my code. Besides, the vaaast majority of hits to all of my sites from Indonesia, India and Brazil are trouble and have been for years.

"Google web light"... "Google web"... Either way I slice it, I don't like it.
5:14 am on Aug 2, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


I did a few tests on several sites over the last week. Regarding not wanting to have your site displayed using Google weblight...

If the cache-control header tag (mentioned above) is installed on your server (via htaccess if in shared hosting) you will see "This website has opted out of transcoding" similar to the "mobile friendly" style text in SERP. The user may alternatively choose to load the normal mobile page.

I would rather users see that than "403 FORBIDDEN"

YMMV
10:23 pm on Aug 13, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


Seems other ISPs are doing it. This is a similar Chinese transcoder: toutiao.com/media_cooperation/

They say they also support the "no-transform" header field (mention in above post)
2:06 am on Aug 14, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15936
votes: 889


I've got a further headscratcher after running the most recent logs.

For reasons I've yet to figure out, page requests from googleweblight are getting blocked. (It is unnerving when you can't figure out why something on your own site is blocked. I thought it might be that they're not sending an X-Forwarded-For header-- which I require if the UA doesn't contain Google, case sensitive-- but the header does seem to be present.) But the real puzzler is ...

In spite of the page request itself being blocked, all supporting files for the page will also get requested. Images and stylesheets only, no scripts, which in turn means no logging. This only happens about 1/3 of the time, and these requests tend to come before the page request. (That is, within the same second, but earlier in logs. My logs can admittedly be a bit hiccupy-- I've seen glitches of up to three seconds-- but every single time? Nuh-uh.)

There's no immediately adjacent googlebot request for the same page, so how do they know what the supporting files are? Is there a cache somewhere in the background-- possibly even the same cache that's offered in searches (different recent thread)-- so what they're really doing is #1, request supporting files based on information in cache, and #2, request the page itself?

I may need to pore over all those requests in detail. Currently I'm a little mystified.
4:25 am on Aug 14, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


Frustrating to discover unintentional blocking. I often use a GET tool and load the requesting UA then start removing stuff until a 200 is returned, then I know what caused the block and I can search through htaccess for it... unless you have the IP blocked or another rule, like as you said, header feilds.


Even though I use the header tag, all the files on the page get requested, but I see your point. I did block the UA at first (until I discovered the header tag) and remember the same thing, that despite serving Google a custom 403 page, all the files from the originally requested page were still requested. So you may be correct that a parallel cache processing thread is being used; seems resource intensive, but then again this is Google.

I also use the X-Robots-Tag noarchive, but of course that doesn't really stop Google or anyone else from caching, just not publishing that cache in SERP (if they support it.)
6:21 pm on Aug 14, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15936
votes: 889


Some stuff I found after taking a closer look:

--googleweblight is older than I thought. The earliest one I found was from May, though they've definitely become more common in recent weeks.

--The full UA string is always
Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 5 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko; googleweblight) Chrome/38.0.1025.166 Mobile Safari/535.19

--Is it possible they've allocated a different IP to each hostname? On one site, visits always came from 64.233.172.less-than-full-24-range; on the other it's always 66.102.6.ditto. Total numbers were not vast, of course, but too many for coincidence. (Aside: I knew that each of these ranges was google, but I re-checked on specifics. 64.233.etc claims to be based in Lacombe, AB, "Alberta's newest city" though I'll need a Canadian to explain what, technically, "city" means when you've got a population of 12,000. 66.102.etc claims to be Wamego, KS. Are these the physical locations of Google data centers?)

--My initial estimate was very close: about 1/3 of googleweblight requests ask only for the page, no supporting files. I have a strong hunch that these are robots-- two were from Brazil-- but they're going through all the proper channels, not just faking a UA, so their visits are otherwise indistinguishable from the with-supporting-files ones.

--A lot of unusual headers.

These are always present:
X-Forwarded-For
(Google is consistently good about sending this header, not just here but in other similar functions like Preview)
X-Gfe-Ssl: yes
(does anyone know what this is? Google was unhelpful. I mean, ahem, ordinary google search)
Referer
(the vast majority are google.co.in. The IP given in X-Forwarded-For always agrees with whatever national google is named in the referer. Sometimes it's just google.xtn; other times there's a long query string that contains the element "android" at least once along with the search terms)

About half of requests had
X-Requested-With: com.google.android.googlequicksearchbox
(always this value if the header was present at all)

About half again-- no relationship, so neither the same nor not-same half-- sent one or the other of
X-Wap-Profile:
(two possible .xml values, several times each, both apparently Chinese, both point to Android browser)
OR
X-Gfe-Tls-Channelid:
(each one different, long string of what looks like Base 64 with + plus and / slash)

One request also sent an
X-Geo:
This one was definitely Base 64, because online decoder came through with
role:1 producer:12 timestamp:1439145809013000 latlng{latitude_e7:-71809303 longitude_e7:-348437901} radius:2150000
I don't know how to translate the latitude and longitude, or for that matter the timestamp and radius. I do notice that both lat/long values are negative, and this was a Brazilian request.
12:22 am on Aug 15, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


X-Gfe-Ssl: yes
(does anyone know what this is?

I believe it checks for HTTPS, but is included for all protocols.
9:30 pm on Oct 24, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15936
votes: 889


:: bump ::

Now that this thing has been running for a while, has anyone got any further insights? I've been marking googleweblight in processed logs, and I'm ### if I can find any significant usage other than people in India looking for {content I most certainly don't have, and that would not come up in any law-abiding search engine*}. Honestly I wonder if I'd be doing both them and myself a favor by routing them all to some not-quite-403 page where they'd get less content and fewer supporting files.


* To the point where I've noindexed one directory because, as far as I can tell, it never shows up in searches except in response to highly suspect search strings. People looking for the correct content have alternative means of entry.
2:59 pm on Nov 2, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


On my site, I rarely see it any more because I use the header tag to opt-out.
Header set Cache-Control "no-transform"

It can even be combined with other attributes:
Header set Cache-Control "max-age=2592000, no-transform"
If you block it, it can't get the header tag so it just keeps coming back. One of my clients wanted me to block it for his site. I couldn't talk him out of it so now I see it in his logs 50-100x a day... but then he gets a half-million page loads a day.