Msg#: 3204506 posted 5:41 pm on Dec 31, 2006 (gmt 0)
When I login to Adsense and look under reports --> Site Diagnotics I have started to to get blocked pages appearing in the last couple of days, the reason given is robots.txt file. I don't currently have a robots.txt file but the Adsense website seems to be telling me I should create one as follows: User-agent: Mediapartners-Google* Disallow: Is this a requirement for using adsense? and will this have any effect on other search engines? Martin
Msg#: 3204506 posted 6:45 pm on Dec 31, 2006 (gmt 0)
So, if it is not a requirement, why is it being given as a reason for blocking ads being served on my pages? Here is a typical entry on the site diagnostics page (I have replaced my url with #### to comply with the terms of this site)
Msg#: 3204506 posted 1:26 am on Jan 1, 2007 (gmt 0)
I just discovered the same with three of my sites. I came here to check if somebody else has the same problem. It is something of the last three days. It shows also for a site which isn't changed for months. Same reason: robot.txt. But nothing is wrong with that. I checked it again but the robot.txt is perfect. Those blocked pages are not on my domain. My hypothesis: Google is blocking pages on the caches of google search and msn. And possible the caches have a robot.txt as well.
Msg#: 3204506 posted 1:42 am on Jan 1, 2007 (gmt 0)
I also noticed this earlier today, but didn't have time to troubleshoot it. I'm looking further into it now.
This sounds like you DO have a robots.txt file AND it is blocking some things...
Sounds logical, but not according to Google. This is the message they provide for sites receiving blocked errors, but who do not have a "robots.txt" file:
From Google: If you do not have a robots.txt file, please create one in the root directory of your domain and then update it by following these instructions.
Apparently, Google is aware of its Site Diagnostic blocked URL reports, and is suggesting creating a "robots.txt" file to better accomodate the Adsense (MediaPartners) crawler. Maybe this is an attempt by Google to serve better ads?
I'm not sure, but the two blocked URL's I'm seeing come from cache. One is from Google's cache, and the other is from Yahoo.
Msg#: 3204506 posted 1:43 am on Jan 1, 2007 (gmt 0)
That number you mention is a DC of Google Search: [184.108.40.206...] This IP address is one of 15 blocked for my pages. Last week the provider updated some software and my sites were down for some hours. Maybe some people checked some pages in the cache then and clicked on an advertisement.
Msg#: 3204506 posted 2:39 am on Jan 1, 2007 (gmt 0)
I see the same think in my reports. I'm going to ignore it - those page views are of the Google cache and Google is sensibly blocking itself in its robots.txt and has a bug in the adsense crawl processing atm. see also [webmasterworld.com...] :)