Just block the bots using Apache web server, easy and simply.
|Just block the bots using Apache web server, easy and simply. |
I'm a rabid anti-scraper and bot blocker but that doesn't solve this problem as I allow ASK onto my site because it does send some traffic, it provides value. By blocking ASK you're cutting off the traffic it provides just to stop a few artificial AdSense page impressions which makes no sense. Besides, not all screen shot tools identify themselves as a BOT so it's not just a quick and easy "block them in Apache" solution.
If you don't get that ASK traffic you get fewer click thrus and a lower income.
The issue here isn't blocking the bot, that's a no-brainer if you wanted to do that and cut off the traffic they provide as well.
The issue is "Does Google count those page impressions from ASK, SNAP and others and do they impact your account?"
Interesting Bill. I just did an Ask search on the name of a popular blog running AdSense across the top of the page. In the SERP click on the binoculars and you can see the snapshot, including an image of the AdSense unit.
|The issue is "Does Google count those page impressions from ASK, SNAP and others and do they impact your account?" |
Well you also mentioned that Ask made over 40,000 screen shots of your website. Did they show up in your AdSense reports? It seems like that many extra impressions would be pretty obvious, unless you're already getting a substantial number (like a million a day).
|Both Ask and Snap make screen shots of various pages in your site... |
I see what you're saying. I just searched for my domain name at Alexa.com and they do the same thing. There were a few dozen little snapshots of various pages, and I could see AdSense ads in each one of them.
The thing is, even if they took snapshots of all 200 pages of my site in one day, it wouldn't affect the statistics very much.
|Ask made over 40,000 screen shots of your website... It seems like that many extra impressions would be pretty obvious |
Actually that was Snap, ASK seems to be a little less aggressive.
Keep in mind that making screen shots is a slow process and they're making screen shots for many millions of websites. I get about 100K page impressions per day, not all are AdSense impressions though, but you can see it would be hard for me to easily tell the impact of these activities while a smaller site might notice right away. However, I've seen various IPs from Snap's screen shot tool [webmasterworld.com] ask for a couple of hundred pages a day for my site.
I never really thought this was a newsworthy item until I noticed more and more sites making screen shots and I figured if Google were actually filtering those sources (Ask, Snap, etc.) you should see PSA's on the screen shots and not real AdSense ads which clearly isn't the case.
As a matter of fact, I myself use a Windows-based tool called WebShot that uses MSIE to make screen shots. Using that tool I recently made about 30K screen shots for my directory site and they show AdSense if it's page. You wouldn't even know a screen shot had been taken because the WebShot tool drives MSIE to take the actual screen shots.
There are a lot of services that make thumb shots like WebSnapr and Thumbshots and these services have to update those images, as well as I'll have to update my images, so it's escalating a bit.
I wouldn't say that this activity is so large it's a problem for AdSense yet but I'm wondering if they're having any overall impact and what, if anything, Google is doing to filter out the obvious sources of this activity from our stats.
|The thing is, even if they took snapshots of all 200 pages of my site in one day, it wouldn't affect the statistics very much. |
Considering some people only get that much traffic total in a day, what would it do to THEIR sites?
I don't think large sites will have any issue, not until this becomes much more pervasive, but I'm curious about the small sites, could it cause smart pricing?
|Before long a percentage of your daily ad impressions will probably be nothing more than screen shots. |
Then I will send my robots to look at it :)
The question is, do they make a lot of screen shots, if your site is ranked high with them and it gives people an extra for their search terms, if they use these engines.
I doubt (out of the blue here), that they will take 40k screen shots of unimportant sites (unimportant along their algos).
|I doubt (out of the blue here), that they will take 40k screen shots of unimportant sites (unimportant along their algos) |
Not sure about ASK yet (they've always crawled slow) but Snap went after every single page on my site.
They aren't the only ones doing screen shots either, just more commonly known.
|I never really thought this was a newsworthy item... |
I think it's interesting :)
|I wouldn't say that this activity is so large it's a problem for AdSense yet but I'm wondering if they're having any overall impact and what, if anything, Google is doing to filter out the obvious sources of this activity from our stats. |
My guess is that it does impact the stats, but the effect is minimal and probably worth the extra vistors that the search engines bring to my site.
As for the smaller sites... it's possible that website traffic is taken into account. My guess is that low traffic sites aren't getting cached. As a test, I checked my employer's website. We don't focus at all on web traffic, and have done nothing to promote the site. We average a whopping 20 page views per day :) Our site didn't show up at all at ask.com, and at Alexa, the snapshot image is a placeholder that says "picture coming soon."
|Then I will send my robots to look at it :) |
Oops! Can I say "A" on here?
Yes, only here it's "laughing my adsense off"
Interesting find. Which bots need to be blocked in order to get rid of the Ask network of sites? (I understand that they have at least two spiders, one for Ask and one for Snap.)
You see, I get virtually no traffic from this network (Ask, Excite, iWon or MyWay), so I guess I could easily cut this off.
I recall this coming up, probably here, a few months ago. At the time I was skeptical but then I did succeed in finding screenshots that showed actual AdSense ads. I would have to think that Google is on top of things by now.
|I would have to think that Google is on top of things by now. |
You would think, but I'm seeing screen shots just days old with AdSense showing.
As far as I understand this should only be a problem when Google is (a) counting these artificial ad impressions, and (b) when they use this for determing EPC or SmartPricing. I could imagine that this is not a serious problem to them. It can be, though, with regards to CPM ads and 40,000 pages being spidered... (In this case a lot of invalid views would be generated by those bots.)
zett, you've hit on a good reason why this is important. If AdSense is charging advertisers for impressions by bots from Ask, Snap, and Alexa then Google may owe CPM advertisers a refund.
|charging advertisers for impressions by bots from Ask, Snap, and Alexa then Google may owe CPM advertisers a refund |
I hate to give away all the answers but CPM ads tend to repeat often on some sites and could be a significant part of 40K impressions on the same site, especially if you have multiple ad units, a cool 1K of wasted impressions to a single CPM campaign are entirely possible.
Adsense ads continue to show in cached copies of your site even six months after it no longer exists. The wayback machine contains more than just a screenshot, it contains the code as it used to be. Images return a red x because they no longer exist but the adsense code carries on.
If there are ads shown in those screenshots then the services creating them are flouting the robots.txt guidelines:
|User-Agent: * |
|If there are ads shown in those screenshots then the services creating them are flouting the robots.txt guidelines: |
[edited by: Tastatura at 8:54 am (utc) on July 17, 2007]
The answer is obviously to get the bots to click the ads, in order to ensure that our statistics aren't messed up...[/edit]
[edited by: vincevincevince at 9:12 am (utc) on July 17, 2007]
>"I doubt (out of the blue here), that they will take 40k screen shots of unimportant sites (unimportant along their algos). "
"As far as I understand this should only be a problem when Google is (b) when they use this for determing EPC or SmartPricing."
But if EVERYONE is seeing the same activity the smart pricing should even out over the long run, across all accounts, and not have any individual impact.
[edited by: MikeNoLastName at 9:13 am (utc) on July 17, 2007]
|But if EVERYONE is seeing the same activity the smart pricing should even out over the long run, across all accounts, and not have any individual impact. |
Except that some will ban the bots, and some won't.
I'm actually out of the US this week (not an excuse - I know you int'l publishers deal with AdSense time differences every day), but I'm going to try to get in touch with my team and see if I can get more information about this issue.
Why would a screenshot be taken of more than your front page?
Why does ASK need a screenshot of every single page? Weirdness.
Personally I love thumbshots of the site (front page) I am about to visit, it really enhances the search experience - example:
|That means your AdSense ads are also being included in those screen shots and possibly skewing your statistics. |
I'm there. My sites have more visits that I know they really have. There are tons of hits by non-visitors (bots). I had to block a lot of things as some areas consume cpu resources.
|Just block the bots using Apache web server, easy and simply. |
Is not that easy... I had (still have) problems with bots that do nothing but consume resources. I blocked some and some still come back (ip deny and robots.txt). htacess is something I still don't quite manage to risk traffic. Also, every month a new bot appears from whatever .edu "for research purposes".
There is a lot to say about blocking bots and the drawbacks on using htaccess.
If your site gets traffic from Ask or Snap but you want to block the AdSense impressions from them, just do it via a PHP (or ASP) script.
All my AdSense ads are displayed in a little PHP include file. I have a flag set to not show AdSense ads if visiting via certain ISP's (such as my own), to prevent accidental clicks. You could simply add the crawler's hostname or IP address to the list. That way, your page would be crawled but not the AdSense block.
^^^ that's what I was going to suggest, just don't show the ads to the bots.
| This 46 message thread spans 2 pages: 46 (  2 ) > > |