Forum Moderators: martinibuster

Message Too Old, No Replies

Correct robots.txt leading to undue PSA/Default Ads

There is a bug and Google is award

         

mlemos

11:18 pm on May 5, 2004 (gmt 0)

10+ Year Member



Although this does not seem to be the cause for all undue PSA/Default ads that appear in many sites, it seems there is a bug in the interpretation of robots.txt by Google bots.

A few weeks ago I changed the robots.txt of all mirrors (tens of them) of a site of mine to make Googlebot not crawl them leading the search users to the main site instead of the mirrors.

The robots.txt is like this:

User-agent: Googlebot
Disallow: /

User-agent: Mediapartners-Google*
Disallow:

The problem was reported and acknowledged by Google AdSense people that admited that this should not prevent Mediapartners-Google bot to crawl the pages where AdSense appear.

The AdSense ads appear in the main site correctly as it does a different robots.txt.

So, a fix is expected to happen once Google AdSense figure what is the problem.

jdMorgan

11:24 pm on May 5, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As a work-around, did you try reversing those two records -- Putting Mediapartners first in your robots.txt?

Just curious.

Jim

jomaxx

12:48 am on May 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



AdSense also uses the results of crawl by the regular Googlebot crawler. It may be that the change in the robots.txt file has caused AdSense not to use that cached data, and you'll be showing lots of PSA's until Mediabot re-crawls everything. IF that's the case, then I wouldn't exactly call it a bug.

Have you confirmed that Mediabot has stopped crawling these sites altogether?

Jenstar

1:34 am on May 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The two bots generally don't share information, especially something like robots.txt. I have many requests in my logs from the AdSense bot checking my robots.txt file.

Do check to see if you have hits by "Mediapartners-Google/2.1" on the pages. If you are getting hits by this on actual pages, the problem is something beyond the robots.txt issue.

mlemos

8:30 am on May 6, 2004 (gmt 0)

10+ Year Member



I already tried inverting the order of the Googlebot and Mediapartners bot lines and the result is the same. Only Mediapartners come to pick the robots.txt and nothing else.

I suspect that the bug is in the fact that the Mediapartners full user agent name contains the words googlebot in the URL part of that text.