Forum Moderators: martinibuster
Just give it a try and replace the template URL
[google.com...]
by your own.
Clearly those of us that use Adsense ads on their pages won't be too happy with
a Google "proxy" that strips out all ads while still serving most of our original
content (text and images). Would NOARCHIVE work to stop GWT? Any other ideas
to prevent abuse of this new entry point into your site content?
Another issue is that if users on wireless devices did click on an ad on your site, following through to whatever action the Advertiser wanted (buying a widget, whatever) may quite unlikely, in which case advertisers wouldn't want to pay for such useless clicks. But then maybe ads could be smartpriced for mobile devices differently from desktop browsing.
The GWT strips javascript and CSS...and I worked so hard on my handheld css, too.
<META NAME="ROBOTS" CONTENT="NOARCHIVE">
and still GWT serves up a cached version!
(In fact I can read that cached page from behind a proxy
that blocks all access to my original page.)
The cached page is not in the normal Google search engine
cache, as one would expect from the use of NOARCHIVE, but
it *is* in the GWT cache.
dhiggerdhigger: the problem is that it is not just mobile
phone users that can (ab)use this. One can use a regular
browser (I use Firefox) on the desktop PC to read these
cached pages that have now been stripped from Adsense ads
(and Javascript in general), but that still have all text
and images. Moreover, it now looks like the NOARCHIVE meta
tag does not help here either.
[google.com...]
while I found that one sometimes needs several
attempts (reloads) because this "service" does
not seem entirely stable yet.
dhiggerdhigger: the problem is that it is not just mobile
phone users that can (ab)use this. One can use a regular
browser (I use Firefox) on the desktop PC to read these
cached pages that have now been stripped from Adsense ads
(and Javascript in general), but that still have all text
and images. Moreover, it now looks like the NOARCHIVE meta
tag does not help here either.
I am sorry to disagree with you, but I think there's few people with so much time to spare to actually do what you suggest. I agree with the second poster who said that these would be worthless clicks. Besides, we are not really losing anything. Do people even browse the internet with their wireless device? I would become eye-fatigued within a minute of doing that. Maybe for checking movies/e-mail it would be okay, but out of all the pageviews only like 0.001 would come from movile phones. Are you really losing that much? Now, get to make more content :)
> Are you really losing that much?
Not yet, but spiders may love it for easy scraper harvesting.
They do not get eye-fatigued. Could start a trend.
BTW, maybe the GWT thing is not caching but "only" transforming
(transcoding) my website content on the fly, acting as a proxy
such that I can look beyond my own proxy block. Still, I don't
like this unauthorized meddling with my original content that
effectively amounts to redistributing modified content.
Right now, my impression is that GWT *is* caching, because the
served page lacks some hard links that I added recently.
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} Google\ Wireless\ Transcoder
RewriteRule .* - [F]
so I can now block this unauthorized and undesirable
public page filtering service. This seems to work fine
with newly entered URLs at [google.com...]
but not with existing cached ones.
one of my pages that for ages contains
<META NAME="ROBOTS" CONTENT="NOARCHIVE">and still GWT serves up a cached version!
The
noarchive is not an instruction never to cache the page (caching is determined by the server headers), but only to stop the "cached page" link from showing in the search engine results. There are two kinds of "cache", the real ones where a temporary copy of a document is held, and the search-engine kind which is a copy of your page displayed on their site.
Well, I cannot test that now because my recently added block
now generates 403's for GWT, but others could try one of their
own pages (one that has long contained NOARCHIVE), make some
dummy change to that page, and next run a search with a unique
combination of identifying keywords for that page through a URL
like (keywords changed here to "spider" just to as an example
that does not identify my own pages)
[google.com...]
and determine if either their modified actual page is served
or some cached older copy. If all is well, the modified page
should turn up, and if not, then GWT is not minding NOARCHIVE
(as the cached page is then turning up in a search result).
BTW, the served page that had lacked my updated spider trap links
was itself a search result of the above URL format, so I expect
NOARCHIVE violations if nothing else has changed at GWT.
[edited by: simplicity at 2:52 pm (utc) on May 24, 2006]
GWT appears to strip all Javascript, which just happens to
include Adsense code just as well. Sites like skweezer.net
appear to not only strip the Adsense (Javascript) but next
add ads of their own to support their business via serving
my content, so I blocked them too now.
> I still argue that wireless devices are not hurting Adsense in any way
I agree, for now the traffic volume is likely extremely low,
but I'm concerned how far this content transcoding trend will
go in the future, both for the desktop PC and mobile devices,
and how one could contain it if the need arises. What sites
like skweezer.net are doing now, effectively replacing Adsense
with their own ads, can also be done for the desktop PC, and
right now one can only "opt-out" by explicitly blocking them.
That can get laborious when too many websites start to play
this game. skweezer.net could be viewed as an example of a
new category of dynamic scraper sites?
[edited by: simplicity at 3:40 pm (utc) on May 24, 2006]
Furthermore, in my case, the are excluding all of my advertising mediums, banners, adsense, store specials...
I'm advertiser supported for the most part, so this is a huge issue. Note, I am seeing an increase in mobile phone visitors.
GWT appears to strip all Javascript
Not really. I just checked my website with [google.com...] and it gracefully displayed JavaScript text and links that uses following methods:
1). <script type="text/javascript" language="javascript" src="example.js"></script>
2). <script type="text/javascript" language="javascript">test();</script>
3). <script type="text/javascript" language="javascript">document.write("text");</script>
This shows that Google is tactfully blocking only AdSense from GWT.
Milan
Hmm, I did further tests and it looks like you are right,
but I found that it somehow strips my spider trap
links - which by coincidence was the very change that
I was looking for. Maybe GWT strips the links that
do not obey robots.txt while transcoding, but I did
not furher check that out. So I was misled by my
spider trap link not showing up in the GWT-rendered
page source. Boy, this is reverse engineering what the
transcoder may be doing.
[edited by: simplicity at 9:13 pm (utc) on May 24, 2006]
WAP doesn't support all of the features and code that modern desktop browsers support. The only way to fit pages designed for the desktop to display properly on a small cell phone screen is to convert the HTML code to WAP code. Also, AdSense ad blocks aren't designed for WAP devices. Even the smallest ad would more than fill a standard cell phone screen. Overall, I think Google is doing a fantastic job of making web content available to WAP devices.
I'm sure Google is working on WAP advertising options for both publishers and advertisers.
It definitely filtered out my robots.txt prohibited image link. Whether that is due to my robots.txt, the tiny size of the image, its transparancy, its uglyness, whatever, I cannot tell, but the link is not in the GWT output. Perhaps my spider trap link is not WAP compatible then.
> This thread shouldn't be a Featured Home Page discussion. At least not with the description given.
I agree that our insights keep changing, but whatever the exact mechanism,
it was someone or something that fell into a robots.txt link trap in relation
to GWT that first drew my attention. Before that I had not even heard of
GWT. Specifically, my trap had reported user agent "Opera/8.01 (J2ME/MIDP;
Opera Mini/2.0.3920; en; U; ssr)" from opera-mini.net as violating my
robots.txt, and the referer URL pointed to a search results page with a
GWT link to my web page with the trap. All the rest is a matter of trying
to understand to what extent it was a user, a spider, GWT or a combination
that caused the robots.txt violation, and what the potential risks are.
Google Wireless Translator Bug Posted in Google AdSense by simplicity
Site Rippers have actively been exploiting a Google bug found about a month ago: "Google Wireless Transcoder (GWT) serves fully cached pages (including images) while stripping Adsense ads from your pages..."
It's not a bug, GWT is not serving fully cached pages or serving non-WAP format images, and GWT strips out all JavaScript (AdSense code or otherwise) as WAP doesn't support it. If you want JavaScript in your WAP pages, look into WMLScript.