homepage Welcome to WebmasterWorld Guest from 54.226.213.228
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Google / Google AdSense
Forum Library, Charter, Moderators: incrediBILL & jatar k & martinibuster

Google AdSense Forum

This 31 message thread spans 2 pages: 31 ( [1] 2 > >     
Google Wireless Transcoder Strips AdSense
simplicity




msg:1457020
 8:26 am on May 22, 2006 (gmt 0)

Google Wireless Transcoder (GWT) serves fully cached pages (including images)
while stripping Adsense ads from your pages. Is that acceptable?

Just give it a try and replace the template URL

[google.com...]

by your own.

Clearly those of us that use Adsense ads on their pages won't be too happy with
a Google "proxy" that strips out all ads while still serving most of our original
content (text and images). Would NOARCHIVE work to stop GWT? Any other ideas
to prevent abuse of this new entry point into your site content?

 

dhiggerdhigger




msg:1457021
 8:34 am on May 22, 2006 (gmt 0)

Do most wireless devices support javascript, or have it turned on? This would affect my opinion.

Another issue is that if users on wireless devices did click on an ad on your site, following through to whatever action the Advertiser wanted (buying a widget, whatever) may quite unlikely, in which case advertisers wouldn't want to pay for such useless clicks. But then maybe ads could be smartpriced for mobile devices differently from desktop browsing.

The GWT strips javascript and CSS...and I worked so hard on my handheld css, too.

simplicity




msg:1457022
 8:58 am on May 22, 2006 (gmt 0)

It gets even worse! I checked [google.com...]
for one of my pages that for ages contains

<META NAME="ROBOTS" CONTENT="NOARCHIVE">

and still GWT serves up a cached version!

(In fact I can read that cached page from behind a proxy
that blocks all access to my original page.)

The cached page is not in the normal Google search engine
cache, as one would expect from the use of NOARCHIVE, but
it *is* in the GWT cache.

dhiggerdhigger: the problem is that it is not just mobile
phone users that can (ab)use this. One can use a regular
browser (I use Firefox) on the desktop PC to read these
cached pages that have now been stripped from Adsense ads
(and Javascript in general), but that still have all text
and images. Moreover, it now looks like the NOARCHIVE meta
tag does not help here either.

great_9




msg:1457023
 9:01 am on May 22, 2006 (gmt 0)

error 502 when I put my site in the url.
?!?!?!?

20 clicks later... it works.

overloaded?

simplicity




msg:1457024
 9:06 am on May 22, 2006 (gmt 0)

great_9: try via

[google.com...]

while I found that one sometimes needs several
attempts (reloads) because this "service" does
not seem entirely stable yet.

joaquin112




msg:1457025
 9:13 am on May 22, 2006 (gmt 0)

dhiggerdhigger: the problem is that it is not just mobile
phone users that can (ab)use this. One can use a regular
browser (I use Firefox) on the desktop PC to read these
cached pages that have now been stripped from Adsense ads
(and Javascript in general), but that still have all text
and images. Moreover, it now looks like the NOARCHIVE meta
tag does not help here either.

I am sorry to disagree with you, but I think there's few people with so much time to spare to actually do what you suggest. I agree with the second poster who said that these would be worthless clicks. Besides, we are not really losing anything. Do people even browse the internet with their wireless device? I would become eye-fatigued within a minute of doing that. Maybe for checking movies/e-mail it would be okay, but out of all the pageviews only like 0.001 would come from movile phones. Are you really losing that much? Now, get to make more content :)

simplicity




msg:1457026
 9:21 am on May 22, 2006 (gmt 0)

joaquin112:

> Are you really losing that much?

Not yet, but spiders may love it for easy scraper harvesting.
They do not get eye-fatigued. Could start a trend.

BTW, maybe the GWT thing is not caching but "only" transforming
(transcoding) my website content on the fly, acting as a proxy
such that I can look beyond my own proxy block. Still, I don't
like this unauthorized meddling with my original content that
effectively amounts to redistributing modified content.

Right now, my impression is that GWT *is* caching, because the
served page lacks some hard links that I added recently.

simplicity




msg:1457027
 11:12 am on May 22, 2006 (gmt 0)

OK, the following appears to block GWT with a 403:

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} Google\ Wireless\ Transcoder
RewriteRule .* - [F]

so I can now block this unauthorized and undesirable
public page filtering service. This seems to work fine
with newly entered URLs at [google.com...]
but not with existing cached ones.

encyclo




msg:1457028
 1:55 pm on May 24, 2006 (gmt 0)

one of my pages that for ages contains
<META NAME="ROBOTS" CONTENT="NOARCHIVE">

and still GWT serves up a cached version!

The noarchive is not an instruction never to cache the page (caching is determined by the server headers), but only to stop the "cached page" link from showing in the search engine results. There are two kinds of "cache", the real ones where a temporary copy of a document is held, and the search-engine kind which is a copy of your page displayed on their site.

Erku




msg:1457029
 2:33 pm on May 24, 2006 (gmt 0)

Why would Google do such a thing? Is it not in Google's interest to support Adsense?

I don't understand the logic.

simplicity




msg:1457030
 2:33 pm on May 24, 2006 (gmt 0)

encyclo:

Well, I cannot test that now because my recently added block
now generates 403's for GWT, but others could try one of their
own pages (one that has long contained NOARCHIVE), make some
dummy change to that page, and next run a search with a unique
combination of identifying keywords for that page through a URL
like (keywords changed here to "spider" just to as an example
that does not identify my own pages)

[google.com...]

and determine if either their modified actual page is served
or some cached older copy. If all is well, the modified page
should turn up, and if not, then GWT is not minding NOARCHIVE
(as the cached page is then turning up in a search result).

BTW, the served page that had lacked my updated spider trap links
was itself a search result of the above URL format, so I expect
NOARCHIVE violations if nothing else has changed at GWT.

[edited by: simplicity at 2:52 pm (utc) on May 24, 2006]

simplicity




msg:1457031
 2:42 pm on May 24, 2006 (gmt 0)

> Why would Google do such a thing? Is it not in Google's interest to support Adsense?

GWT appears to strip all Javascript, which just happens to
include Adsense code just as well. Sites like skweezer.net
appear to not only strip the Adsense (Javascript) but next
add ads of their own to support their business via serving
my content, so I blocked them too now.

joaquin112




msg:1457032
 2:59 pm on May 24, 2006 (gmt 0)

It's called "Branding" for Google. Right now what they want is to literally destroy Yahoo. Anyway, I still argue that wireless devices are not hurting Adsense in any way. Maybe a couple of dollars to the high-earning publishers out there...

simplicity




msg:1457033
 3:12 pm on May 24, 2006 (gmt 0)

joaquin112:

> I still argue that wireless devices are not hurting Adsense in any way

I agree, for now the traffic volume is likely extremely low,
but I'm concerned how far this content transcoding trend will
go in the future, both for the desktop PC and mobile devices,
and how one could contain it if the need arises. What sites
like skweezer.net are doing now, effectively replacing Adsense
with their own ads, can also be done for the desktop PC, and
right now one can only "opt-out" by explicitly blocking them.
That can get laborious when too many websites start to play
this game. skweezer.net could be viewed as an example of a
new category of dynamic scraper sites?

[edited by: simplicity at 3:40 pm (utc) on May 24, 2006]

Edge




msg:1457034
 3:37 pm on May 24, 2006 (gmt 0)

I think there is more going on here than meets the eye. I have had repeated problems with the pre-cache issues going on with the mobile viewer. It seems that to speed things up, they are caching all links contained within a webpage. In my case, they are tripping my robot traps and banning themselves. The bottom line is that they are not following the robot.txt instructions.

Furthermore, in my case, the are excluding all of my advertising mediums, banners, adsense, store specials...

I'm advertiser supported for the most part, so this is a huge issue. Note, I am seeing an increase in mobile phone visitors.

Wemic




msg:1457035
 3:53 pm on May 24, 2006 (gmt 0)

Hello WebmasterWorld!

I tried out the link and was pleased to find out that it does not work on my sites that have a SSL cert installed.

milanmk




msg:1457036
 8:07 pm on May 24, 2006 (gmt 0)

GWT appears to strip all Javascript

Not really. I just checked my website with [google.com...] and it gracefully displayed JavaScript text and links that uses following methods:

1). <script type="text/javascript" language="javascript" src="example.js"></script>

2). <script type="text/javascript" language="javascript">test();</script>

3). <script type="text/javascript" language="javascript">document.write("text");</script>

This shows that Google is tactfully blocking only AdSense from GWT.

Milan

Key_Master




msg:1457037
 8:32 pm on May 24, 2006 (gmt 0)

GWT is not pulling archived copies of pages. This is old news really. Google has had a wap proxy service for some years now.

simplicity




msg:1457038
 8:46 pm on May 24, 2006 (gmt 0)

> it gracefully displayed JavaScript text and links that uses following methods

OK, yes, it passes on the processed Javascript results, not the Javascript
source code that I was looking for in a quick HTML source check. Thanks
for correcting me.

simplicity




msg:1457039
 9:07 pm on May 24, 2006 (gmt 0)

> GWT is not pulling archived copies of pages.

Hmm, I did further tests and it looks like you are right,
but I found that it somehow strips my spider trap
links - which by coincidence was the very change that
I was looking for. Maybe GWT strips the links that
do not obey robots.txt while transcoding, but I did
not furher check that out. So I was misled by my
spider trap link not showing up in the GWT-rendered
page source. Boy, this is reverse engineering what the
transcoder may be doing.

[edited by: simplicity at 9:13 pm (utc) on May 24, 2006]

keyplyr




msg:1457040
 9:09 pm on May 24, 2006 (gmt 0)

All SSL links that are displayed using external CSS do not work. These are my product buy pages so I am not a happy camper.

Key_Master




msg:1457041
 9:29 pm on May 24, 2006 (gmt 0)

GWT isn't filtering robots.txt prohibited links. It's not a spider- it's a WAP interface.

WAP doesn't support all of the features and code that modern desktop browsers support. The only way to fit pages designed for the desktop to display properly on a small cell phone screen is to convert the HTML code to WAP code. Also, AdSense ad blocks aren't designed for WAP devices. Even the smallest ad would more than fill a standard cell phone screen. Overall, I think Google is doing a fantastic job of making web content available to WAP devices.

I'm sure Google is working on WAP advertising options for both publishers and advertisers.

simplicity




msg:1457042
 9:57 pm on May 24, 2006 (gmt 0)

> GWT isn't filtering robots.txt prohibited links. It's not a spider- it's a WAP interface.

It definitely filtered out my robots.txt prohibited image link. Whether that is due to my robots.txt, the tiny size of the image, its transparancy, its uglyness, whatever, I cannot tell, but the link is not in the GWT output. Perhaps my spider trap link is not WAP compatible then.

Key_Master




msg:1457043
 10:07 pm on May 24, 2006 (gmt 0)

WAP images use a different format- Wireless Bitmap Images. GWT is probably stripping non-WAP image code from the page source.

trinorthlighting




msg:1457044
 10:57 pm on May 24, 2006 (gmt 0)

I smell another class action lawsuit on the horizon..

Key_Master




msg:1457045
 11:26 pm on May 24, 2006 (gmt 0)

This thread shouldn't be a Featured Home Page discussion. At least not with the description given.

simplicity




msg:1457046
 7:10 am on May 25, 2006 (gmt 0)

Key_Master:

> This thread shouldn't be a Featured Home Page discussion. At least not with the description given.

I agree that our insights keep changing, but whatever the exact mechanism,
it was someone or something that fell into a robots.txt link trap in relation
to GWT that first drew my attention. Before that I had not even heard of
GWT. Specifically, my trap had reported user agent "Opera/8.01 (J2ME/MIDP;
Opera Mini/2.0.3920; en; U; ssr)" from opera-mini.net as violating my
robots.txt, and the referer URL pointed to a search results page with a
GWT link to my web page with the trap. All the rest is a matter of trying
to understand to what extent it was a user, a spider, GWT or a combination
that caused the robots.txt violation, and what the potential risks are.

Edge




msg:1457047
 12:00 pm on May 25, 2006 (gmt 0)

Key_Master, I am reasonbly sure that mobile device internet surfing could catch wind in serious way. For those of us which are advertiser supported - this is a featured thread!

Key_Master




msg:1457048
 12:44 pm on May 25, 2006 (gmt 0)

Here's an old related thread:
[webmasterworld.com...]

Google Wireless Translator Bug Posted in Google AdSense by simplicity
Site Rippers have actively been exploiting a Google bug found about a month ago: "Google Wireless Transcoder (GWT) serves fully cached pages (including images) while stripping Adsense ads from your pages..."

It's not a bug, GWT is not serving fully cached pages or serving non-WAP format images, and GWT strips out all JavaScript (AdSense code or otherwise) as WAP doesn't support it. If you want JavaScript in your WAP pages, look into WMLScript.

Adam_Lasnik




msg:1457049
 7:34 pm on May 25, 2006 (gmt 0)

We do our best to render Web sites' original content in such tiny medium. Webmasters are welcome to get more information about this issue and optionally request to have their sites excluded from Google's transcoding here:
[google.com ]

This 31 message thread spans 2 pages: 31 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google AdSense
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved